key: cord-0524723-3n61w5gf
authors: Bilodeau, Blair; Stanford, David A.
title: High-Priority Expected Waiting Times in the Delayed Accumulating Priority Queue with Applications to Health Care KPIs
date: 2020-01-16
journal: nan
DOI: nan
sha: e3eed1d5de29edd9e6d931e7bb4f0d041418ff8f
doc_id: 524723
cord_uid: 3n61w5gf

We provide the first analytical expressions for the expected waiting time of high-priority customers in the delayed APQ by exploiting a classical conservation law for work-conserving queues. Additionally, we describe an algorithm to compute the expected waiting times of both low-priority and high-priority customers, which requires only the truncation of sums that converge quickly in our experiments. These insights are used to demonstrate how the accumulation rate and delay level should be chosen by health care practitioners to optimize common key performance indicators (KPIs). In particular, we demonstrate that for certain nontrivial KPIs, an accumulating priority queue with a delay of zero is always preferable. Finally, we present a detailed investigation of the quality of an exponential approximation to the high-priority waiting time distribution, which we use to optimize the choice of queueing parameters with respect to both classes' waiting time distributions.

Accumulating priority queues (APQs) are a class of queueing disciplines in which waiting customers accrue credit over time at class-dependent rates. By convention, the highest priority customers belong to class-1, and they accumulate credit at the highest rate. At service completion instants, the customer present with the greatest amount of accumulated credit is the next one selected for service. These are especially useful in highly congested systems when even a moderate proportion of the arrival load is due to high-priority customers, since with high probability there will be least one high-priority customer in system and thus low-priority customers can have extremely long wait times under a strict priority regime.

APQs are well-understood theoretically, beginning with the derivation of the M/G/1 waiting times for all classes in Stanford et al. (2014) . Further extensions include preemptive service (Fajardo and Drekic 2017) , nonlinear priority accumulation (Li et al. 2017) , hard upper limits on waiting time (Cildoz et al. 2019) , and applications to COVID-19 policies (Oz et al. 2020) . Most relevant to the present work is Mojalal et al. (2019) , in which for the first time, not all classes accumulate priority credit starting from their arrival instant. choice of parameter values (which we find to be unique in our experiments).

While the exactness of the high-priority expected waiting time is desirable for optimizing queueing parameters relative to health care KPIs, it is still unable to capture the tail behaviour of the waiting time, which may change the optimal choice of queueing parameters. Consequently, we also propose a zero-inflated exponential approximation of the high-priority waiting time, whose efficacy we demonstrate using both exact APQ waiting time CDFs and simulated delayed APQ waiting time CDFs. This provides all the information needed to fully optimize the queueing parameters for health care KPIs, subject to approximation error.

The rest of the paper is arranged as follows. Section 2 defines notation and reviews the current theoretical results relating to the delayed APQ. Section 3 contains our new analytical expressions for the class-1 expected waiting time in the delayed APQ along with a detailed analysis of the delay level's impact on this. We apply this analysis to health care KPIs in Section 4. In Section 5, we define an exponential approximation for the high-priority waiting time distribution, and evaluate its quality through numerical experiments and simulation. We also present further optimization for health care KPIs using this approximation. We conclude the paper in Section 6 with our observations and future theoretical directions. All code is available at https://github.com/blairbilodeau/delayed-apq-avg-wait.

Consider two classes of customers, labelled class-1 and class-2, where by convention class-1 has more urgency to be seen than class-2. Suppose they experience exponential inter-arrival times with rates λ 1 , λ 2 ∈ (0, ∞), so that overall customers arrive to the system at rate λ = λ 1 + λ 2 . Let S denote the common service time of any customer and 1/µ = E S. As usual, define the corresponding occupancy rates ρ 1 = λ 1 /µ and ρ 2 = λ 2 /µ, and for stability assume that ρ = λ/µ < 1. Since the relevant results depend only on the ratio of the class-2 accumulation rate to that of class-1, let class-1 customers accumulate priority at rate one credit per time unit and class-2 customers accumulate at rate b ∈ [0, 1]. Furthermore, in the delayed APQ, there is some d ≥ 0 such that all class-2 customers only begin accumulating after they have been in system for d units of time. The queueing discipline is such that at every service completion, the customer with the highest accumulated priority enters into service immediately, with no preemption, and consequently the server is only idle when the system is empty.

Denote the waiting time random variable of a customer by W, with a superscript to specify the queueing discipline and subscript to specify the priority class when required. For example, W DAPQ 1 is the waiting time of a class-1 customer in a delayed APQ, while W FCFS is the waiting time of a customer in a first-come-first-serve (FCFS) queue. For any random variable X that has distribution function F , denote the Laplace-Stieltjes transform (LST) of F byF (s) = E e −sX . We also introduce the notationF (s; d) = E e −sX 1{X > d} . Let the CDF of the service time be F S (x) = P[S ≤ x] and have LSTF S . The same superscript and subscript notation is used to denote the distribution function of a waiting time; for example, F DAPQ let τ t denote the time of the most recent service commencement. Further, let V (τ t ) denote the amount of priority accumulated at time τ t by the customer who commenced service at time τ t . Define M 2 (t) to be the maximum amount of priority a class-2 customer could have accumulated by time t, given only the previous times at which a service commenced. More precisely, M 2 (t) = 0 when the queue is empty, and otherwise it is defined recursively by

Note that the definition of M 2 (t) depends on b but not d.

A class-1 customer becomes accredited once their accumulated priority at time t is strictly larger than M 2 (t). Accreditation can never be undone, since the class-1 accumulated priority grows linearly with slope 1 while M 2 (t) grows at most linearly with slope b ≤ 1. A class-2 customer is always unaccredited by definition. Crucially, an unaccredited customer will not be served until there are no accredited customers remaining in system. Further, each unaccredited customer entering into service generates an accreditation interval, which consists of their service time plus the service times of all accredited customers served before the next unaccredited service. We denote the CDF of the random variable corresponding to the length of an accreditation interval by η and the LST byη, with a superscript for the queueing discipline. Lemma 4.2 of Stanford et al. (2014) shows that, under Poisson arrivals, customers in the APQ become accredited according to a Poisson process at rate

, which we refer to as the accreditation rate. For notational simplicity, we define ρ A 1 = λ A 1 /µ. Relative to a specific tagged customer, let N t denote the number of customers ahead of them in system (including the one in service) t units of time after their arrival, and π i = P[N 0 = i] denote the stationary distribution of the number of customers found in system upon arrival. Let R denote the residual service time of the customer currently in service upon arrival of the tagged customer, and denote its CDF by F R with LSTF R . For the random variable corresponding to the length of a residual accreditation interval, which is composed of R plus the service times of all accredited customers served before the next unaccredited service, we denote the CDF by η R and the LST byη R . For any j ∈ N, define the conditional CDF F R|j (t) = P[R ≤ t | N d = j] and the conditional LSTF R|j . Finally, conditional on N d = j, denote the conditional CDF of the residual accreditation interval length by η R|j , and denote the corresponding conditional LST byη R|j . Observe that the number in system and the residual service time are independent of the queueing discipline, depending only on the arrival rates and service distribution, while the residual accreditation interval depends on the queueing discipline, which will be denoted by a superscript as usual.

We are now able to restate the following main results from Mojalal et al. (2019) that will be used in the remainder of the paper. While the results are stated out of order from the original paper, we feel this is more natural for observing how the M/M/1 delayed APQ is a special case of the M/G/1 delayed APQ where the residual accreditation interval has the same distribution as a standard accreditation interval. 

In this section, we provide analytical expressions that can be evaluated by truncating an infinite sum to find both low and high-priority expected waiting times in the delayed APQ. These expressions allow us to visualize the impact of the delay level on the high-priority expected waiting time, providing a deeper understanding of the dynamics of the delayed APQ. Then, in Section 4, we use our computation algorithm for the analytical expected waiting time expressions to choose the optimal parametrizations for the delayed APQ under various conditions.

The primary takeaway of the following results is that we have analytical statements that can be implemented as an algorithm requiring only the truncation of infinite sums that converge quickly in our experiments. This allows for rapid testing of various parameters to tune the accumulation and delay rates to meet any KPIs of interest to practitioners. We present the results for both exponential and deterministic queueing disciplines, and while the derivation will follow the same strategy, the specific expression will change significantly for an alternative service distribution.

The first result that we use to obtain our results is the application of Corollary 3.2 from Mojalal et al. (2019) to the NPQ, corresponding to b = 0. This lemma shows that, in addition to the previously known fact that the NPQ and delayed APQ waiting times agree when within the delay period, their divergence following the end of the delay period is completely characterized by their different accreditation rates. Thus, in order to quantify the expected waiting time, we simply need to compute the expected increase from the differing accreditation rate and combine it with known expected waiting times for the NPQ.

andη NPQ R|j (s) =F R|j (s + λ 1 (1 −η NPQ (s))) . Additionally, the key fact that allows us to obtain the class-1 expected waiting time is that for work-conserving queues, the expected waiting time between classes can be related to the FCFS waiting time by their respective occupancy rates.

Theorem 1 of Kleinrock (1965) (Work-Conserving Conservation Law) For any queue with K classes, each with a Poisson arrival rate of λ k and common service distribution S, and a single-server non-preemptive queueing discipline,

Observe that the delayed APQ (which includes the APQ and NPQ as special cases) satisfies the conditions of this theorem; that is, all of these queues are work-conserving. Thus, we can apply these results to obtain analytical expressions for the average waiting time of both class-1 and class-2 customers in the M/M/1 and M/D/1 delayed APQs. The main technique is to differentiate the LST expressions for the waiting time, leading to expectations, and then compute only the difference between these terms for the delayed APQ and the NPQ. The cancellation within this difference allows for the computation of the expected waiting time rather than only the expected waiting time conditional on whether it falls before or after the delay period. The terms are then simplified to provide an explicit implementation; full derivations are provided in Appendix A.

Theorem 1 (M/M/1 Expected Waiting Time Computation).

where q = µ µ+λ 1 , p = λ 1 µ+λ 1 , r = p + qρ 2 , and ν = µ + λ 1 ; the x (k) 's are defined recursively as

2 = r 2 − p 2 , and for k ≥ 3, x

Furthermore,

Next, we consider the M/D/1 case. The additional difficulty comes from the fact that service is no longer memoryless, which leads to a more complex expression. However, once the residual service times are handled using results from Adan and Haviv (2009) , the result follows from the same method as for the M/M/1. Note that we have the same limitation as in Mojalal et al. (2019) , where the delay level must be an integer multiple of the mean service length.

.

where π i is given by

Also,

Remark 1. The formula for π i is difficult to implement efficiently for large i, but can be approximated by π i+1 /π i =σ, whereσ solves e ρσ = σe ρ (c.f. Appendix C of Tijms (1994) ).

The utility in computing this result for the M/D/1 is that it allows the effect of the service time variation on high-priority waiting times to be isolated. We may then approximate the average waiting time for a service distribution with the same mean but smaller variance than exponential service by simple interpolation between the M/D/1 and the M/M/1.

Using a truncation of the infinite sums from Theorems 1 and 2 (which is exact in the infinite sum limit), we are now able to visualize the effect of introducing a delay on the high-priority waiting time. The truncation is performed such that the individual contribution of the terms has become smaller than 10 −5 . The computations were performed in the R programming language on a 2017 Macbook Pro with 16GB of RAM, and all took (sometimes significantly) less than 5 minutes of run time. We did not conduct an extensive study of computational complexity, and instead only wish to highlight that the computation timescale is minutes rather than days, and that arbitrarily better accuracy can be achieved by sacrificing run time in favour of computing more terms in the sum.

Recall that the accumulation parameter b and delay parameter d generalize both the FCFS and NPQ regimes. Specifically, the APQ with b = 1 corresponds to FCFS, the APQ with b = 0 or the delayed APQ with d = ∞ correspond to the NPQ, and the delayed APQ with d = 0 corresponds to the APQ. Consequently, increasing the delay level will yield a waiting time between that of the APQ and that of the NPQ, where the former has the shortest class-2 waiting times and the latter has the shortest class-1 waiting times. To demonstrate this interpolation, we present the results for how changing the accumulation rate and delay period affects the expected waiting time for both class-1 and class-2 customers. Fig. 1 shows the effect of varying accumulation rate within [0, 1] on the class-1 (left panel) and class-2 (right panel) expected waiting times for M/M/1. We discuss the effect on class-1, since the class-2 values are just a vertical reflection and scaling by occupancy due to the M/G/1 conservation law. Observe that, by definition, the NPQ expected waiting time is unaffected by accumulation rate. However, for each delay level, the curve begins equal to NPQ at b = 0, and then increases sub-linearly as b tends to 1. This confirms that allowing class-2 customers to accumulate credit more rapidly penalizes class-1 customers, but reveals that this is marginally less impactful as the limit of b = 1 is approached. Furthermore, as d gets smaller, the curves shift up vertically, approaching the APQ, which corresponds to d = 0. The continuous effect of this change is explored in Fig. 3 .

The same patterns apply for the M/D/1 case in Fig. 2 , although all the waiting times are lower as there is no longer variation in the service times. It is of interest that the effect seems to be roughly halving the wait, which is exactly the impact on the expected waiting time in a FCFS queue when moving from M/M/1 to M/D/1.

Next, we are interested in the effect of changing d over different values of b in Fig. 3 , where again the left panel pertains to class-1 and the right panel pertains to class-2. The FCFS case (corresponding to b = 1 and d = 0) will provide expected waiting times that act as an upper bound for the class-1 expected waiting time. Then, starting from d = 0 (the APQ expected waiting time), each b curve decreases smoothly towards the NPQ expected waiting time, which corresponds to b = 0. While we observe that the marginal impact of increasing d always becomes smaller as d becomes very large, the initial changes are much more pronounced (steeper slope) for small values of b. The same patterns hold, with the same scaling of about 1/2, for the M/D/1 case in Fig. 4 . Again, as mentioned for Theorem 2, we can only compute this at integer multiples of the mean service length (one, in this case). 

In this section, we use our algorithm for the high-priority expected waiting time to analyze the optimal choice of parameters in a delayed APQ to meet certain health care KPIs. In particular, we are interested in the same KPIs studied in Sharif (2016), Li et al. (2019) , and Mojalal et al. (2019) . Using the Canadian Triage and Acuity Scale (CTAS), these papers define the low-and high-priority customers within an emergency department after excluding the patients who must always be seen immediately and those who have very minor ailments (and make up a negligible proportion of emergency department patients). Then, the prescribed KPIs by Bullard et al. (2017) , which are unchanged from Bullard et al. (2008) , are for CTAS-3 (class-1) patients to wait less than 30 minutes, 90% of the time, and for CTAS-4 (class-2) patients to wait less than 60 minutes, 85% of the time. Note that these KPIs correspond to sample proportions since in practice they are evaluated using only observed data, but we study them in the infinite data limit, which corresponds to the actual probabilities under the stationary queueing system. Throughout this section, we will use W DAPQ(d,b) and F DAPQ(d,b) to explicitly denote dependence on the queueing parameters of both the waiting time and corresponding CDF. Mojalal et al. (2019) incorporate the CTAS KPI by optimizing over the accumulation rate given the desired delay level. That is, given a delay level d, a waiting time target w, and a compliance probability p, they solve for

(1)

The KPI example they explicitly use is w = 4 and p = 0.8, which in the context of a mean service length being 15 minutes (Dreyer et al. 2009 ) corresponds to the smallest accumulation rate for a given delay level such that at least 80% of CTAS-4 patients wait less than one hour. However, this approach requires one to fix the delay level, and it is unclear what the optimal way to do so is. Fortunately, the additional information of the class-1 expected waiting time allows us to extend this analysis by optimizing for both d and b together. Our optimization objective, given λ 1 , λ 2 ∈ (0, 1) (taking µ = 1 for simplicity, and supposing λ 1 + λ 2 < 1 to ensure the queue is stable), is to choose d and b that minimize the class-1 expected waiting time subject to the class-2 constraint of a waiting time target and compliance probability. Specifically, we aim to find

To do so, we first identify which pairs (λ 1 , λ 2 ) have a nontrivial solution to Eq. (2). In Figure 9 of Mojalal et al. (2019) , the authors observe that the range of d values with b * (d) < 1 is quite small for their KPI at various occupancy levels. In Fig. 5 , we complete this observation by identifying all values of λ 1 and λ 2 where the optimal pair (d * , b * ) exists and does not correspond to d * = ∞ or b * = 0 for various KPI parameters. We refer to this set of values for (λ 1 , λ 2 ) as the feasible region. This simplifies the problem by reducing the number of optimizations we need to perform, and demonstrates the restrictive nature of the CTAS KPIs, since most real emergency departments operate at high total levels of occupancy.

To find the feasible region, we use two observations. First, recall that the NPQ regime leads to the lowest class-1 waiting times, so if the constraint F DAPQ 2 (w) ≥ p can be achieved by the NPQ then there is no further optimization to be done. Second, the FCFS regime uniformly results in the lowest class-2 waiting times, so if the constraint F DAPQ 2 (w) ≥ p cannot be achieved by the FCFS then the occupancy is simply too high for the KPI to be met. To visualize this, in Fig. 5 we plot the lower boundary of when the NPQ is strong enough and the upper boundary of when the FCFS is too weak to achieve the KPI for class-2 customers for both one hour (left panel) and half hour (right panel) waiting time targets with various compliance probabilities. The interpretation of these plots is that for each KPI probability level, the (λ 1 , λ 2 ) pairs to the left of the feasible region have sufficiently small occupancy such that the KPI can be achieved by class-2 customers even under the penalizing NPQ, while to the right of the feasible region it is impossible to meet the KPI. Thus, the (λ 1 , λ 2 ) pairs that require further optimization are only those that fall within this feasible region. Now, for each (λ 1 , λ 2 ) pair within the feasible region, there are multiple (d, b) pairs that can be chosen to ensure that class-2 customers meet the required KPI. To find (d * , b * ) from these, we observe that for each fixed d value, E W DAPQ(d,b) 1 is monotonically increasing with b. This observation follows by decomposing expected waiting time into the expected number of customers that will be served ahead of a tagged customer multiplied by the expected service length of each of these customers. Since b does not affect the number of class-1 customers served ahead of a tagged class-1 customer, and an increase in b increases the amount of priority each class-2 customer has (and hence the number that will be served first), this relationship holds for any arrival and service distributions. The implication of this observation is that (d * , b * ) = (d * , b * (d * )), and hence Eq. (2) can be reduced to two easier, univariate optimizations. (w). Thus, the constraint will always be active; that is, F DAPQ(d * ,b * ) 2 (w) = p. Combined with the argument of the previous paragraph, we have reduced the problem to finding the optimal d out of those for which F DAPQ(d,b * (d)) 2 (w) = p. To determine which d value is optimal to choose, we turn to our contribution of the expected class-1 waiting time, performing a univariate optimization over these d values to determine which d minimizes E W

is convex as a function of d. However, since (λ 1 , λ 2 ) is in the feasible region, there is a maximal value of d for which it is possible to achieve F DAPQ(d,b * (d)) 2 (w) ≥ p, and hence we can perform a global univariate optimization. We formalize the actual computation steps for this procedure in Algorithm 1.

To illustrate our approach, we use Figs. 6 and 7, which focus on the middle triangle of the left panel of Fig. 5 , corresponding to the KPI of P(W DAPQ 2 < 4) ≥ 0.85. In both figures, each of the three panels correspond to different (λ 1 , λ 2 ) pairs that lie in the triangle, with the x-axis enumerating the values of d for which b * (d) < 1 (that is, d ∈ [0, d max ] as defined in Algorithm 1). In Fig. 6 , the y-axis plots b * (d), while in Fig. 7 , the y-axis plots

. Concretely, Step 3 of Algorithm 1 corresponds to finding the d that is the arg min of the y-axis in Fig. 7 .

What we find in each of the panels is quite surprising, since as d increases, b * (d) increases so much that the net effect is to increase the class-1 expected waiting time. This suggests that for the purpose of meeting class-2 KPIs while optimizing class-1 expected waiting time, one should not prefer a delayed APQ over a standard APQ. Furthermore, moving left to right through the panels reveals that the detrimental effect of increasing the delay level on the class-1 expected waiting time becomes more severe as λ 2 controls a high proportion of occupancy. Finally, we note that while these figures only address specific KPI examples and the M/M/1 case, further numerical investigation showed the same conclusions hold for M/D/1 and other KPI levels.

Beyond computing expected waiting times for class-1 customers, it is of great interest to characterize the entire waiting time distribution. Currently, the theoretical tools available are insufficient for capturing the recursive dependence structure inherent to the delayed APQ, which differs from the APQ primarily by having class-1 customers experience different accreditation rates depending on the status of the queue, which breaks the necessary independence assumptions used in the analysis of the latter. Instead, we turn to finding an analytic approximation of the class-1 waiting time, which is a well-studied strategy for earlier versions of queues, but was previously inapplicable without the summary statistic computation we provide in this work. In particular, we employ a zero-inflated exponential approximation, and measure its quality via both exact numerical validation and simulation procedures. We note that the purpose of this work is to identify computational procedures for summary statistics of the delayed APQ that do not require simulation, however, we still feel it is a reasonable tool to use for validation and justification purposes. We focus on an exponential approximation in the M/M/1 case for simplicity, although based on the related work, it is plausible that the approximation would also be suitable in the M/G/1 case. The driving motivation towards using an exponential approximation is that the class-1 waiting time in both the APQ and the delayed APQ interpolates between the FCFS (longest waiting times) and the NPQ (shortest waiting times). At both of these end points, for an M/M/1 queue, class-1 waiting times are distributed according to a zero-inflated exponential. We denote such a random variable by Z ∼ Z-Exp(ρ, α), which for ρ, α > 0 has CDF defined for t ≥ 0 by

It is easy to see from Eq. (3) that P(Z = 0) = 1 − ρ. Fortunately, we know that ρ = λ/µ, so there is only one parameter to optimize. Then, by integrating P(Z > t), we obtain α = ρ/E Z. In other words, if we want Z = W 1 , we can define the zero-inflated exponential approximation with only the occupancy ratio ρ and the expected waiting time E W 1 . Fortunately, this is exactly what we have available to characterize W DAPQ 1 . Approximating queueing dynamics using exponential models has a long history in the literature. Most relevant to this work is Abate and Whitt (1995) , who also approximate waiting times with a zero-inflated exponential in the M/G/1 queue. See the references therein for other historical uses of various exponential approximations. Despite this lengthy literature, to the best of our knowledge, exponential approximations of queues have only ever been used to deal with intractability due to the service and arrival distributions. Instead, Algorithm 1: Optimization to find (d * , b * ) Inputs:

• (λ 1 , λ 2 ) in the feasible region for KPI determined by w and p;

• a function argmin : (f, (a, b)) ∈ (R → R) × R 2 → R that returns an x ∈ [a, b] achieving the global minimum of f (x) on the interval [a, b];

• a function root : (f, (a, b)) ∈ (R → R) × R 2 → R that returns an x ∈ [a, b] with f (x) = 0 on the interval [a, b] whenever one exists.

Result: Optimal queueing parameters (d * , b * ) that solve Eq. (2). \* Find the largest d value such that the KPI can be achieved by some b < 1 *\ \* In practice, d max < 5, so one need not start with MAX_FLOAT for the range

we propose an approximation to overcome intractability due to the queueing discipline.

To justify our use of an exponential approximation, we first compare its accuracy in the simpler APQ, where we also have access to the exact waiting time CDF in order to compare. In Fig. 8 , we plot the exact CDFs against the approximate CDFs for various levels of ρ. In Fig. 9 , we present this same information in another way, plotting the absolute difference between the exact CDFs and the approximate CDFs.

The approximation seems to work well when b is large, λ 1 > λ 2 , or b is very small. For extreme b (near 0 or 1), the queueing discipline is close to what one would observe in the NPQ and FCFS cases, respectively, for which the approximation is exact. When λ 1 > λ 2 , the lower priority class is not as impactful on the service dynamics as the higher priority class, and as λ 2 → 0 the approximation once more tends towards the exact solution. We also consider the same analysis from a different perspective in Fig. 9 , comparing the absolute difference between the true and approximate waiting time CDFs. Here we see nontrivial error for small t (corresponding to 1-5 average service lengths), although when ρ is large (as we expect in a health care setting), this error caps out around 5%, and similarly the error appears to be smaller when λ 1 > λ 2 .

We now perform a similar analysis of the zero-inflated exponential approximation from the last section, but for the actual queueing discipline of interest (delayed APQ). Since the true class-1 waiting time CDF is unknown (hence the approximation), we instead compare to the CDF computed via simulation. We avoid using these simulations elsewhere in the paper, as the main focus is on analytical expressions for the quantities of interest, but feel that in this section they are a warranted tool for justification. q q q q q q q q q q q q q q q q q q q q q q q q q q q q The simulations are run using a simple Python script to brute force reproduce a delayed APQ via discrete-event simulation (i.e., every "customer" is generated and explicitly moves through the queue). Each set of parameters was simulated for n = 4000 customers following a burn-in period of 1500 customers, from which empirical CDFs were computed, and then these empirical CDFs were averaged over 50 runs. The averaged, empirical CDFs were compared to the known CDFs for FCFS, APQ, and delayed APQ class-2 as validation, and found numerically indistinguishable (see Fig. 10 ).

We now reproduce Fig. 9 for the simulated delayed APQ to evaluate the zero-inflated exponential approximation in this setting. Fig. 11 demonstrates that the accuracy is mostly unchanged from the APQ setting (d = 0), with the exception that the curves are less smooth due to the stochasticity of the simulations.

In this section, we mirror the optimization procedure carried out in Section 4, but using the entire (approximate) high-priority waiting time CDF rather than only the expected value.

In Fig. 12 , we plot the feasibility region for (λ 1 , λ 2 ) pairs that require tuning of d to satisfy a high-priority KPI constraint, in contrast with the low-priority constraint considered in Fig. 5 . This feasible region is exact, since we do not require an exponential approximation to compute the FCFS and NPQ CDF of high-priority customers. Note that for certain highpriority KPIs, the feasible region takes a diamond shape rather than a simple triangle shape. This is because we are considering essentially the same targets and compliance probabilities as for class-2, but these are naturally easier to achieve for class-1, and thus a larger feasible region becomes visible.

The roles of the FCFS and the NPQ are reversed in Fig. 12 from Fig. 5 . In particular, the NPQ is more favourable to class-1 than the FCFS, and consequently the lower boundary corresponds to when the KPI can be met even in the FCFS while the upper boundary corresponds to when the KPI cannot be met even in the NPQ. By inspection, one can see that the overlap between the regions requiring optimization of d for the advocated KPIs of P(W DAPQ 2 < 4) ≥ 0.85 and P(W DAPQ 1 < 2) ≥ 0.9 is nearly negligible; it makes up a sliver of a triangle for λ 1 between 0 and 0.1 and λ 2 between 0.55 and 0.65. Nonetheless, this specific KPI pair is only a suggestion from a certain context, so we proceed with exploring optimal (d, b * (d)) pairs using the class-1 feasible regions.

We now extend Fig. 7 using our approximation of the class-1 waiting time CDF. To do so, we plot the expected class-2 waiting times for b * (d) chosen to optimize this expectation subject to still meeting the class-1 constraint. More specifically, we use

in conjunction with our zero-inflated exponential approximation to F DAPQ 1 . In Fig. 13 , we see the expected pattern that b * (d) is monotonic in d. As a consequence of our exponential approximation, the KPI at target w and compliance probability p is achieved if and only if *

Thus, since b * (d) is the largest b that achieves the KPI for a fixed d, the class-1 expected waiting time under (d, b * (d)) is always the constant value equal to the RHS of Eq. (5). If it were not, there would be some slack in the KPI (i.e., E W DAPQ 1 would be strictly smaller than the RHS), and thus b * (d) could be taken larger while still achieving the KPI, which contradicts b * (d) being the maximum such value. By the conservation law, this also keeps the class-2 expected waiting time constant, which we observe in Fig. 14. To investigate whether the fact that (d, b * (d)) keeps E W DAPQ 1 constant is only an artefact of our exponential approximation, we also computed full class-2 waiting time CDFs for (d, b * (d)) pairs. The CDFs were all numerically indistinguishable (up to 7 decimal places) within the same (λ 1 , λ 2 ) pair, which suggests that optimizing to obtain b * (d) not only results in constant expected waiting time, but also constant higher-order moments of the class-2 waiting time. Thus, we arrive at the same conclusion as we did for class-2 KPIs. Specifically, when optimizing the queueing parameters to minimize class-2 expected waiting time subject to meeting the class-1 KPI, there is no benefit to having access to the delayed APQ beyond * This follows from rearranging Eq. (3) with α = ρ/E W DAPQ 1 . the APQ. However, we do note that since the class-2 waiting time is essentially constant as a function of d, selecting a delayed APQ over an APQ may have practical differences that make it preferable.

This paper builds on previous results for class-2 waiting times by providing an analytical expression for the class-1 expected waiting time in the M/G/1 delayed APQ. We provide an algorithm that can be implemented for the M/M/1 and M/D/1 queueing disciplines, and numerically demonstrate the effect of changing the accumulation rate and the delay period. We then apply our computation algorithm for the expected waiting time to the health care setting; specifically, waiting times for patients in Canadian emergency departments. Previous analysis of KPI compliance for class-2 customers is extended by also optimizing over the expected waiting time for class-1 customers. We conclude that outside of the regions where the NPQ suffices or the FCFS fails, the optimal queueing discipline is always the APQ, or equivalently a delayed APQ with d = 0. Using a zero-inflated Exponential approximation, we also investigate optimizing KPI parameters for a class-1 constraint. We observe that this approximation seems reasonably accurate, and that the same trend holds where the optimal queueing discipline is always the APQ rather than the delayed APQ.

Major open problems include extending our analysis of expected waiting times to other service distributions of interest and identifying an exact analytical expression for the class-1 waiting time. Additionally, since this may be intractable beyond simple service distributions, it is of interest to extend our zero-inflated exponential approximations to other situations to facilitate the use of delayed APQs in real-world settings.

Proof of Theorem 1. By Corollary 3.1 of Mojalal et al. (2019) ,

Then, using the definition of LST and Theorem 3.2 of Mojalal et al. (2019) ,

where the last line follows fromη DAPQ (0) = 1. Now, similarly,

and by Lemma 1,

Consequently,

where the last line follows from differentiating the explicit forms ofη DAPQ andη NPQ . We use a continuous time Markov chain to handle this conditional probability, which is characterized by the transition matrix

such that for q = µ µ+λ 1 , p = λ 1 µ+λ 1 , and ν = µ + λ 1 we obtain

We are only interested in i and j such that the system is busy, so we define P + to be the P matrix with its first row and first column removed. Observe that P k + = (P k ) + . Now, define the following row and column vectors:

where π + is defined in view of the stationary distribution of the M/M/1 being π i = (1 − ρ)ρ i .

Then,

First, observe that π + P + = (1 − ρ) qρ 2 , pρ + qρ 3 , pρ 2 + qρ 4 , pρ 3 + qρ 5 , . . . .

That is, for all but the first term,

for r = p + qρ 2 . Repeatedly carrying out this the process and applying induction, we obtain (π + P k + ) = (1 − ρ)ρ −k r k ; ≥ k + 1.

Thus, letting x (k) = (π + P k + ) /(1 − ρ) for , k ∈ N, we can write

Focusing on the second term,

Combining Eqs. (6) and (7) gives

where the last line follows from the fact that e −rνd (rνd) k k! is a Poisson(rνd) probability mass function. Then, it remains to observe that the x (k) 's indeed satisfy the recursive formula and that from Kleinrock (1976) ,

Finally, apply the conservation law to get the class-1 expected waiting time.

Proof of Theorem 2. Using the same logic as the proof of Theorem 1 applied to Corollary 3.2 of Mojalal et al. (2019) ,

(8) Recall that the LST of deterministic service isF (s) = e −s/µ . Thus,

.

Similarly,

Then, we have the following intermediary terms to assist in computing the residual accreditation interval. Recall that d ∈ N. Let N R − be the number of customers in system immediately before the first service completion after arrival. For k ∈ N and r > 0, define

Also, for i ∈ N and r > 0, define

Finally, for k ≤ m ≤ , j ∈ Z + , and r > 0, define

Then, we define

Recall that since the service length is always 1/µ, the unconditional residual service time is R ∼ Unif(0, 1/µ). For each j ∈ N,

Thus, plugging this into Eq. (8) gives

It remains to compute these integrals. To do so, observe the following lemma:

Then,

π k−n λ j+ +n−k 1 µ n+1 (n + a + 1)n! .

Next,

π k−n λ j+ +n−k 1 (n + a + 1)n! .

Thus,

π k−n λ j+ +n−k 1 (n + a + 1)n! .

The exact same calculations with an additional r inside the integral gives

π k−n λ j+ +n−k 1 (n + a + 2)n! .

Plugging these last two results in along with the M/D/1 average waiting time from Kleinrock (1976) gives the first statement of the theorem.

Finally, consider the case where d = 0. First, observe that

Next, we have the following result from Adan and Haviv (2009):

Thus, using the average queue length for an M/D/1 queue,

. < 2) ≥ 0.9. q q q q q q q q q q q q q q q q q q q q q q q q q q for (d, b * (d)) pairs associated with various occupancy levels for the KPI P(W DAPQ 1 < 2) ≥ 0.9.

Exponential approximations for tail probabilities in queues, i: Waiting times

Conditional ages and residual service times in the M/G/1 queue

Revisions to the canadian emergency department triage and acuity scale (CTAS) guidelines 2016

Revisions to the canadian emergency department triage and acuity scale (CTAS) guidelines

Accumulating priority queues versus pure priority queues for managing patients in emergency departments

Physician workload and the canadian emergency department triage and acuity scale: The predictors of workload in the emergency room (POWER) study

Waiting time distributions in the preemptive accumulating priority queue

A conservation law for a wide class of queueing disciplines

Optimising key performance indicator adherence with application to emergency department congestion

Nonlinear accumulating priority queues with equivalent linear proxies

The lower-class waiting time distribution in the delayed accumulating priority queue. INFOR: Information Systems and Operational Research

Static vs accumulating priorities in healthcare queues under heavy loads

Probability Models for Healthcare Operations with Application to Emergency Medicine

Waiting time distributions in the accumulating priority queue

Stochastic Models: An Algorithmic Approach

This work is supported by an NSERC Canada Graduate Scholarship and an NSERC Discovery Grant. The authors are grateful to Dr. Doug Down for his insights regarding the geometric tail decay rate of the M/D/1 queue length, and to Elisheva Schwarz-Zur for pointing out that the statement of Theorem 1 was incomplete in a preliminary draft.