key: cord-0558913-5w23qo3e
authors: Daw, Andrew; Hampshire, Robert C.; Pender, Jamol
title: How to Staff when Customers Arrive in Batches
date: 2019-07-30
journal: nan
DOI: nan
sha: a03f93044691e90aa821ef831fd039a06b249912
doc_id: 558913
cord_uid: 5w23qo3e

In settings as diverse as autonomous vehicles, cloud computing, and pandemic quarantines, requests for service can arrive in near or true simultaneity with one another. This creates batches of arrivals to the underlying queueing system. In this paper, we study the staffing problem for the batch arrival queue. We show that batches place a significant stress on services, and thus require a high amount of resources and preparation. In fact, we find that there is no economy of scale as the number of customers in each batch increases, creating a stark contrast with the square root safety staffing rules enjoyed by systems with solitary arrivals of customers. Furthermore, when customers arrive both quickly and in batches, an economy of scale can exist, but it is weaker than what is typically expected. Methodologically, these staffing results follow from novel large batch and hybrid large-batch-and-large-rate limits of the general multi-server queueing model. In the pure large batch limit, we establish the first formal connection between multi-server queues and storage processes, another family of stochastic processes. By consequence, we show that the limit of the batch scaled queue length process is not asymptotically normal, and that, in fact, the fluid and diffusion-type limits coincide. This is what drives our staffing analysis of the batch arrival queue, and what implies that the (safety) staffing of this system must be directly proportional to the batch size just to achieve a non-degenerate probability of customers waiting.

Even the best laid plans can go astray. Perhaps the most fundamental and universal pillar of queueing theory's contributions is the recognition that "congestion [is] a stochastic phenomenon" (Kingman 2009 ). Uncertainty and variability are inherent to service systems, and thus probability models have become natural tools to understand the structure of these operations and identify managerial insights within them. Some of the clearest and most useful examples of this preparation for randomness lie in staffing decisions. Often referred to as safety staffing rules, these regimes prescribe a certain level of additional staffing on top of the expected number of customers in the system if the staffing was unlimited. Much like an inventory safety stock may balance the tradeoff of holding costs and ordering costs, the safety staffing level weighs some quality-of-service metric, such as the fraction of customers that wait, against some measure of operational efficiency, such as the total staffing cost or simply the total number of servers employed. In this paper, we show that batch arrivals place a particularly strong stress on service systems, requiring a high level of resources to achieve what is typically only a tradeoff-comprimising quality of service.

Classically, three of the most commonly contrasted staffing paradigms are the quality driven (QD), quality-and-efficiency driven (QED), and efficiency driven (ED) regimes. All three of these are designed for heavily trafficked services that receive a large rate of arriving customers. As the names suggest, these each land at different points on the quality and efficiency tradeoff. The QD regime places a premium on the customer experience, maintaining a high number of servers to ensure that essentially no customer will wait. On the other hand, the ED regime prioritizes system efficiency, with a relatively low number of servers yielding that virtually all customers will wait.

Striking a balance between these two, the QED regime uses a medium level of safety staffing to create an environment in which some, but not all, customers wait. We will formally review each of these regimes and their surrounding literature in Section 2.1; an excellent and extensive review is available in van Leeuwaarden et al. (2019) . In the language of these regimes, our results show that batch arrivals of customers require QD-like staffing just to achieve QED-like probability of delay.

Our original motivation for studying the staffing problem for the batch arrival queue actually lies in the management of autonomous vehicles, or more precisely in autonomous vehicle remote assistance systems. These infrastructures have arisen out of the concern that driverless vehicles may never achieve full autonomy. 1 Instead, there is suspicion both inside and outside the industry that autonomous vehicles may always need some human help. That is, it is suspected that driverless technology may plateau at a degree of capability that allows for full autonomy only in well-controlled and understood environments, and otherwise would rely on human help when pushed outside of that comfort zone (see, e.g., Kalra and Paddock (2016) , Tibken (2018)). Teleoperation or remote assistance systems aim to ameliorate this and help autonomous vehicles deliver on the immense societal and economic promise envisioned by, e.g., Burns (2013) , Lanctot (2017), Jadhav (2018). The idea is essentially a call center for driverless cars, where remote human agents provide either passive feedback (remote assistance) or active driving input (full teleoperation) when the vehicles request it (Sawers 2018, Davies 2019 , Rosenszweig 2019 , Hampshire et al. 2020 , Mutzenich et al. 2021 . For example, Waymo, the industry leading self-driving vehicle arm of Alphabet Inc., describes a remote assistance "fleet response team" in the following excerpt from their safety methodology report (Webb et al. 2020) :

"While Waymo's [Automated Driving System (ADS)] is responsible for making every driving decision on the road, we have a fleet response team that can provide remote assistance to the ADS if needed. Our ADS is designed to recognize unexpected situations and contact our fleet response team, who can confirm what the ADS is seeing and provide additional contextual information. Our ADS only asks these questions to gain a deeper understanding of what it has already detected and perceived. In some situations our remote assistant may direct the AV to pull over and stop (e.g., due to a need to await further direction as a result of a road closure far ahead), but the ADS continues to perform the entire [Dynamic Driving Task] ." Webb et al. (2020) emphasizes also that this is not proper teleoperations: the remote assistants are not actively driving the vehicle, but rather supporting it through input and feedback when the vehicle encounters overwhelming uncertainty. Similar passive controls have been considered by other firms (Sawers 2018). In previous work oriented towards broad audiences, we reasoned about how this could work in a look-ahead fashion at each support request through simulating many possible near-future states and providing assistance feedback for each (Hampshire et al. 2020) . 2 For safety's sake, this could even involve some instantaneous crowd-sourcing, in which each future state is actually evaluated by multiple agents. Much like Webb et al. (2020) describes, this would actually require support from a team of agents, or, more precisely, create a batch of tasks to be served.

Hence, the question "how many remote assistants are needed to support an autonomous vehicle fleet?" becomes "how do you staff a batch arrival queue?" Similar humans-in-the-loop applications of batch arrival queues appear in many other modern digital services, such as data labelling and fact checking.

However, the relevance of this question of course needs not be limited to only remote assistance of autonomous vehicles or humans-in-the-loop settings. In fact, this is not even the only autonomous vehicle topic in which batch arrival queues are relevant. Many major self-driving car firms have developed their driverless algorithms based on sophisticated search methodologies (Chen 2019 , Silver 2021 . These algorithms often employ simulation or machine learning procedures called en masse to evaluate potential next states or actions. Driverless cars are of course not the only applications that use such methodologies; even just a brief review of Monte-Carlo-based methods shows contexts as diverse as wildfire management, queueing control, optimal stopping, pricing, Go and other games, data processing, and data center management, among many others (Browne et al. 2012 , Xie et al. 2016 , Bertsimas et al. 2017 , Chen and Goldberg 2018 , Shah et al. 2020 . Generally speaking, the batch arrival queueing structure arises from the multiplicity of the simulation replications, machine learning parallelizations, or other collective calls of the computing procedures. Here, each arrival to the system actually constitutes many jobs or tasks to be processed, rather than just one. Hence, like the remote assistants problem before, the question "what level of computing resources do we need to routinely run these algorithms?" also now becomes "how do you staff a batch arrival queue?"

Of course, there are additional applications of batch arrival queues across a myriad of other contexts. For example, in the Covid-19 era, many universities have allocated quarantine space for infected students to isolate in campus hotels (Auerbach et al. 2020 , Fox et al. 2021 , Giufurta and O'Connell 2021 , Gluckman 2021 . Because tests are conducted in batches, positive cases are found in batches; and thus the batch staffing question is equivalent here to asking if enough hotel rooms have been set aside. Many classic examples of batch arrival queues lie in transportation, especially in public or mass transit. For a particular example that these authors have (unfortunately) experienced, customers from the same flight arrive to an airport car rental desk all at once, which places a large stress on the system and thus can create lengthy wait times. Beyond truly simultaneous batches, it is increasingly common that heavily trafficked services may actually receive an arrival stream that is gradual then sudden, varying significantly across time. This has been observed as a naturally occurring phenomenon of customer arrivals across many different application domains (see, e.g., Aksin et al. (2007) , Kim and Whitt (2014) , Ibrahim et al. (2016) ). In such settings, arrival rates have been observed to be over-dispersed over short intervals of time, leading to many arrivals in a brief period of time. Intuitively, sufficiently rapid bursts of arrivals should function like a truly simultaneous batch arrival; we make formal arguments to support this in Section 3.

In sum, all these applications serve as inspiration for us to study the staffing problem for the batch arrival queue. What is a good place to start here? One tempting approach could be to recognize the effective customer arrival rate as the product of the batch arrival rate and the batch size, and then plug this effective rate into classical solitary arrival staffing formulas. On the other hand, another tempting approach could be to recognize that a batch of size n could be split to n separate queueing systems. Hence, we could find the staffing level for a queue with solitary arrival rate given by the true batch arrival rate and then calculate the batch staffing level by multiplying this solitary staffing level by the batch size. In Figure 1 , we can see that both of these approaches miss the mark, with the former leading to severe under-staffing, and the latter, over-staffing.

In hindsight, it's not hard to find counter-arguments for each of these approaches. For the first method, while it is true that the arrival rate is the average rate that customers enter the service system, that does not address the fact that many of these customers enter simultaneously, leaving many of the effective inter-arrival times equal to 0. This will lead to under-staffing because it under-estimates the variability of the customer arrival process. Then, in the second method, it is true that n customers arrive at once, but this neglects the potential pooling benefits from having a centralized queue that receives all customers rather than each server having their own separate, dedicated queues. 3 Hence, Figure 1 illustrates the importance and intrigue of this problem. As the batch size increases, the performance of each solitary staffing heuristic weakens. However, in the second method, in which the staffing level is directly proportional to the batch size or, equivalently, to the effective arrival rate, the staffing exceedance probability does not appear to be converging to 0, which would be the case in solitary arrival systems according to the QD regime. Intuiting that this means that large batch arrivals create distinct challenges not reproduced by large arrival rates, we will approach the staffing problem by studying the limit of the queueing model as the batch size increases. Through our analysis, we will be able to explain precisely why each of the methods in Figure 1 won't work and, more importantly, rigorously identify how to properly staff a batch arrival queue. For a target probability of = 0.01, the "Arrival Rate" staffing levels are calculated with arrival rate λn and service rate µ, while the "Servers" staffing levels are calculated by multiplying the λ and µ level by n.

In their seminal work, Halfin and Whitt (1981) remarked that "[t]he balance between service and economy usually dictates that the probability of delay be kept away from both zero and one, so that the number of customers present fluctuates between the regions above and below the number of servers." Here, we find that if the traffic is large through its simultaneity rather than merely through its rate, it may take as much as QD-level resources to achieve QED-type performance.

Specifically, in terms of managerial insights, we have a two-pronged message in this paper:

i) In queueing systems with large batch arrivals, QD-style staffing, in which the number of servers is exactly proportional to the effective customer arrival rate, leads to QED-style performance, in which the limiting process is random (Theorem 2). That is, there is a non-degenerate probability that the queue length exceeds the number of servers (Corollary 2). Hence, there is no economy of scale for staffing as the batch size grows large.

ii) The need for stronger-than-square-root staffing remains even when both the arrival rate and the batch sizes are large. A hybrid batch-size-and-arrival-rate scaling reveals a spectrum of asymptotic regimes, book-ended by the classical QED heavy traffic scaling at one extreme and by our proposed multi-server large batch scaling on the other (Theorem 4). For the latter end and interior of this spectrum, the safety staffing level is of order strictly larger than the square root of the effective arrival rate. Therefore, when both the batch size and the batch arrival rate grow large there is a weaker economy of scale relative to what is typically found when just the arrival rate grows large (such as from the QED regime).

Methodologically, the large batch staffing is theoretically justified by a novel batch scaling limit of the multi-server queue, the first of its kind for a finite server model and significantly more general than prior infinite server results. This scaling yields a connection from batch arrival queues to storage processes, another family of continuous time stochastic processes. The limit is similar to a fluid scaling in that we scale inversely by the index of the limit, but the normalization is through the batch size, rather than the arrival rate of batches. Hence, the arrival epochs are untouched, which preserves the randomness in the limit. Both the batch scaling and the hybrid limits of the multi-server queue are scaffolded from first establishing results for the infinite server analog of the model. The hybrid scaling reveals what we find to be an interesting and subtle point about the large batch scaling. That is, there is an asymptotic normality for the limit of the infinite server queue throughout the interior of the spectrum and on the pure arrival rate extreme, but, on the other extreme, there is not an asymptotic normality in the pure batch scaling (Theorem 3) .

Moreover, this shows that the fluid and diffusion type scalings actually coincide for the pure batch scaling, which follows from the fact that the mean and standard deviation of the queue are directly proportional throughout the limit.

The remainder of this paper is organized as follows. In Section 2, we survey the literature and the service operations context, with particular attention given to reviewing the QD, ED, and QED regimes and to prior work on batch arrival queues and on storage processes. Following this, we define and discuss the general batch arrival queueing model in Section 3. The main two sections of analysis are Sections 4 and 5. Section 4 contains the large batch analysis, in which we show that general infinite server and multi-server batch arrival queues converge to shot noise and storage processes, respectively. The batch arrival queue staffing techniques and the associated absence of an economy of scale follow immediately. Section 5 is then devoted to the hybrid batch scaling, considered in the Markovian setting so as to make the underlying relationships clear. Here we first prove strong law of large numbers and central limit theorem type limits for the infinite server queue, showing that asymptotic normality only appears when the arrival rate is included in the limit. In this case, we can explicitly provide the limiting probability that the queue length exceeds the number of servers. All proofs are contained in the appendix, as are additional supporting and auxiliary results.

There are two primary streams of literature from which this paper draws inspiration, and to which it hopes to contribute: operational staffing regimes and the analysis of batch arrival queues. We survey the former of these in Subsection 2.1, along with their associated heavy traffic limits, and in Subsection 2.2 we review the literature on batch arrival queues, as well as other relevant stochastic processes like the storage process and the fork-join queue.

To set the stage for this paper, let us briefly review the operational context of the QD, ED, and QED regimes. For more depth than the space here allows, readers should see van Leeuwaarden et al. (2019) for a recent survey and nice tutorial of the literature for the QED regime and the corresponding heavy traffic limits, containing much of the state of the art of this methodology and its use in surrounding problems. In this subsection, we will first sketch the three regimes and then highlight prior works of particular interest and relevance.

As we have mentioned in the introduction, each of these regimes are based upon the theory of heavy traffic limits, meaning asymptotics of the multi-server queue length process as the arrival rate grows large. Hence, we will frame these three in terms of this arrival rate, say λ. Walking through the arrival rate, the staffing level, and the system performance, these three regimes can be summarized as follows:

• Quality Driven (QD):

-solitary arrivals at rate proportional to λ -number of servers c(λ) proportional to λ + δλ for some δ > 0 -in this limiting regime, no customers wait and P (Q(λ) ≥ c(λ)) → 0

• Efficiency Driven (ED):

-solitary arrivals at rate proportional to λ -number of servers c(λ) proportional to λ + δ for some δ > 0 -in this limiting regime, all customers wait and P (Q(λ) ≥ c(λ)) → 1

• Quality and Efficiency Driven (QED):

-solitary arrivals at rate proportional to λ -number of servers c(λ) proportional to λ + δ √ λ for some δ > 0 -in this limiting regime, some customers wait and P (Q(λ) ≥ c(λ)) → h(δ) ∈ (0, 1)

The second bullet under each of the regimes provides context for the staffing problem we will consider throughout the paper, and together the second and third bullets can explain the nomenclature. That is, the QD regime bears the name "quality" because of the absence of waiting in the limit. By contrast, the ED regime achieves "efficiency" because the staffing is just slightly above the incoming rate of customer traffic, protecting system stability without committing to much else.

Finally, QED strikes a balance between the two with its so called "square root (safety) staffing rule." Some customers wait, but not all do, and the staffing level grows with arrival rate, but it's not directly proportional to it. In other words, doubling the arrival rate would call for staffing less than double the number of servers. See Figure 12 in Gans et al. (2003) for a comprehensive summary of the operational performance of these different regimes, as well as an excellent survey of relevant call center research, where these regimes have been of particular practicality.

There has been a quite rich history of work analyzing these operational regimes; we provide here a brief sampling. Starting with the foundation, Halfin and Whitt (1981) is universally recognized as the analytical cornerstone of QED staffing. Indeed, the QED regime is also often referred to as the Halfin-Whitt regime, and it is the heavy traffic limit established by Halfin and Whitt (1981) that first formally established the square root staffing rule, solidifying what had been folklore for many years prior. Furthermore, the analysis therein continues to serve as a blueprint for research in these settings, as will be the case here. Also quite relevant here are Borst et al. (2004) and Garnett et al. (2002) , which both classify and contrast the three operational regimes (QD, ED, and QED). Jennings et al. (1996) , Green et al. (2007 ), Feldman et al. (2008 find similar success of square root staffing in the case of time-varying arrival rates; see Whitt (2007) for a survey specifically devoted to the staffing problem in time-varying settings. There has also been a great deal of work extending these concepts from their Markovian model origins to greater generality of arrival and service time distributions. To this end, the analysis translating the QED regime and Halfin-Whitt limits to the G/GI/c queue in Reed (2009) will be of particular importance to our analysis here, since the G B t /GI/c queue is the setting for our large batch limits.

Our work here also joins a stream of literature that considers alternatives outside of the QD, ED, and QED regimes, as well as ones that live between them. For example, Bassamboo et al. (2010) considers parameter uncertainty in the arrival rate and finds that for low uncertainty the square root staffing rule still performs well, but in the case of high levels of uncertainty it can be outperformed by newsvendor-style methods. Gurvich et al. (2010) approaches demand rate uncertainty through a chance-constrained optimization problem, where quality of service requirements are addressed through high probability guarantees, rather than average performance guarantees. Jongbloed and Koole (2001) , Mathijsen et al. (2018) are similarly interested in addressing uncertain and overdispersed demand. The non-degenerate slowdown regime in Atar (2012) , Atar and Solomon (2011) , Atar and Gurvich (2014) is of particular relevance for our hybrid batch-and-arrival-rate limits, as here we also construct the limit through an extra parameter housed in the exponent of two separate components of the queue. By comparison to Atar (2012) , these two are the arrival rate and the batch size, rather than the number of servers and the service rate. Another important connection for both our large batch and hybrid scalings are peakedness approximations (see, e.g., Whitt (1992), Massey and Whitt (1996) , Pang and Whitt (2012) and references therein). 5 The staffing levels we identify in this work align with the order of magnitude of peakedness approximation staffing for both the large batch and the hybrid scalings, but we make two important emphases here. First, in both scalings, the terms which are analogs of the peakedness parameter are also increasing in the limit as the mean arrival rate increases. Second, and perhaps more importantly, in the large batch scaling the peakedness approximation does identify the correct order of magnitude, but there is no asymptotic normality. In fact, as we remark in Section 4, the limiting distribution in the large batch scaling is strictly greater than sub-Gaussian.

The large batch setting in this paper separates our work from previous queueing theoretic studies on batch arrival multi-server queues, which either assume a less general arrival process than we consider here or, perhaps more critically, assume bounded batch sizes. For example, Neuts (1978) and Baily and Neuts (1981) each use matrix-geometric approaches to study the GI B /M/c queue under the assumption that the batch size distribution B is bounded, with Neuts (1978) considering the stationary setting and Baily and Neuts (1981) the transient. In each setting, the bounded-ness of the batch size is essential, as this bound dictates the size of the underlying matrices. This bounded batch size assumption is also used to study the GI B /M/c model in Zhao (1994) and Chaudhry and Kim (2016) , with the former giving explicit expressions for the generating function and an equation satisfied by the steady-state probabilities and the latter providing efficient computational methods while also simplifying the approach of the former. Again the bound is essential, as these approaches are built upon root-finding methods where the number of roots is equal to the batch size. By comparison, the general unbounded batch size setting has often called for approximate approaches, such as the bounds on the GI B /G/c system that were constructed in Yao et al. (1984) through comparison to single arrival queues. Yao (1985) then gives tighter bounds for the M B /M/c queue using the M B /G/1 system and demonstrates that these bounds can be used to approximate the GI B /G/c. Computational methods have also been provided in Cromie et al. (1979) for the fully Markovian setting, the M B /M/c queue, although these were only done for three specific batch distributions: constant size, geometric, and Poisson.

In studying this large batch setting we will prove batch scaling limits of the queue, in which the batch size and the number of servers grow large and the queue length is scaled inversely. Through the batch scaling limits, we connect the general batch arrival queueing models to storage processes, another class of stochastic processes. Similar albeit less general scalings have been explored recently in de Graaf et al. (2017) , Daw and Pender (2019) . Specifically, the limits we prove in this work for the G B t /GI/c queue generalize the batch scaling results of M B /M/∞ queueing systems shown in de Graaf et al. (2017) , Daw and Pender (2019) , which converge to shot noise processes with exponential decay. Let us emphasize that the limits in de Graaf et al. (2017) , Daw and Pender (2019) are inherently tied to the Markovian assumption. Both prior works draw upon tools specifically for Markov processes, like the infinitesimal generator and the associated partial and ordinary differential equations. Here, in addition to extending to the multi-server setting, we also adopt greater generality of distributions and thus do not have such methodology available to us. To prove this generalization beyond the Markovian setting, we develop an approach that is entirely agnostic to the arrival epoch process, which is what enables our results to be immediately applicable to queues with time-varying and/or correlated inter-arrival times. This approach of leveraging the infinite server queue to understand the multi-server system is similar to the techniques used by Reed (2009) to extend the Halfin-Whitt heavy traffic limits to non-Markovian service durations.

Indeed, Reed (2009) serves as a key predecessor and inspiration for our proof methodology in the large batch scaling. The limiting relationship between infinite server queues and shot noise processes was also discussed as motivation in Kella and Whitt (1999) , although this relationship was presented without proof. This connection allows us to make use of a broad literature on storage processes, which can be seen as a generalization of shot noise processes.

Storage processes, which can also be referred to as dams, content processes, or even fluid queues, are positive valued, continuous time stochastic processes in which the process level will jump upwards by some amount at epochs given by a point process. Between jumps the process will decrease according to some function of its state. In generality, the release dynamics may also be a function of the history of the process rather than just the current state; such a setting will be necessary to study the multi-server queue's limiting form in the case of general service. Many of the results that will be most relevant here are focused on the stationary distributions of storage processes. Even on its own the study of stationary distributions of storage processes has a rich history, with early work including expressions of stationary distributions for shot noise processes given in Gilbert and Pollak (1960) . Later work found similar results for more general settings, including Cinlar and Pinsky (1972 ), Yeo (1974 , 1976 , Rubinovitch and Cohen (1980) , Kaspi (1984) .

A line of study that will be particularly useful for us can be found in Brockwell (1977 ), Brockwell et al. (1982 , as these works find integral equations for the stationary distributions of storage processes in generality. These forms will be of great use to us in our staffing analysis. For precursors to this work in a different but no less interesting setting, see Resnick (1976, 1978) .

Connections between queues and storage processes are not new in general, as the single server queue has been known to have a workload process that is a storage process. However, to the best of our knowledge, our work is the first connection between multi-server queue models and storage processes. For an overview of these connections and the related ideas, see Prabhu (2012).

Another interesting and closely related process to the batch arrival queue is the fork-join queueing model, which may hold relevance for many similar applications while also being inherently distinct from the models we study here. At its most elemental, the fork-join queue functions as follows. Upon each arrival a job is split into k parallel tasks, each one routed to one of k separate servers. Each task waits for service, is served, and then waits to be re-joined with the rest of the tasks in the job.

Once all k tasks have been completed by their respective servers and re-joined, the job is considered complete. Aside from simply being an intriguing stochastic model to analyze (e.g. Baccelli et al. (1989) , Lu and Pang (2017) (2014) . In comparing fork-join and batch arrival queues, the key differences lie in the structuring of the waiting and in the post-processing. For the former, it is an issue of centralization versus decentralization: the batch arrival model has one pooled queue that feeds jobs or tasks to servers as they become available, whereas the fork-join model has one queue per server or station. Then, for the latter, the batch arrival queue by default does not include a synchronization step re-joining the jobs at departure. For the purposes of the staffing problem, we consider the centralization to be the more important difference. While the synchronization may prompt a different focus in performance metrics, if the step can be automated it may not actually require its own server. By comparison, the pooling principle posits that there are meaningful differences between centralized and decentralized queueing structures, and, following that intuition, the staffing requirements we find here should be a lower bound on what is needed in matching fork-join systems.

To distinguish our setting from the literature on batch arrival queues and their close relatives, let us clearly define the batch arrival queueing model we analyze here in full generality. We do so in Subsection 3.1, in which we establish the notation we will use throughout the paper. We also believe our results have high applicability for burst arrival queues, and we make brief arguments in favor of this in Subsection 3.2.

In Kendall notation, the general model we study in this paper is the G B t /GI/c queue. That is, arrivals to the queueing system occur in batches drawn from a sequence of independent and identi-

We will refer to n as the relative batch size; the limits in Section 4 will be indexed by n. These batch arrivals occur at epochs given by some general and possibly time-varying point process, hence we will let N t be the number of epochs that have occurred by time t for all t ≥ 0. Similarly, let {A i | i ∈ Z + } denote the arrival epochs. Then, we will let c > 1 be such that cn ∈ Z + is the number of servers, meaning that the staffing level grows with the relative batch size. The servers will serve customers in a First-Come-First-Serve discipline. There is unlimited waiting space. We will assume that service durations are drawn from a sequence of independent and identically distributed positive random variables,

the batch in which the specific customer arrived (i) and then by the in which order customers from this batch entered service (j). We will let G(x) = P (S 1,1 ≤ x) for all x ≥ 0, which does not depend on n. Additionally, letḠ(x) = 1 − G(x) for all x. Following the same indexing, we will let W i,j be the time that the j th customer within the i th batch waits before beginning service.

Using these components, the specific stochastic process we will study will be Q C t (n), which is the queue length process (meaning the total number in system, including the customers in service and those waiting) for the general batch arrival multi-server queue at time t ≥ 0 for relative batch size n. The C superscript refers to this model as the general batch arrival analog of the classical Erlang-C model. This superscript is of particular relevance in distinguishing the model from a close cousin that we will use as a stepping stone in our analysis: the G B t /GI/∞ queue. Let us define Q ∞ t (n) as the analogous infinite server queueing model as a counterpart to Q C t (n). That is, Q ∞ t (n) tracks the queue length or total number in system under the same assumptions on batch sizes, arrival epoch process, and service distributions, with the single (but important) difference being that there is an unlimited number of servers. Hence, no customers will wait for service, rendering the Q ∞ t (n) an idealized form of Q C t (n) that is more tractable for analysis. We will assume that the initial conditions, Q C 0 (n) and Q ∞ 0 (n), are known for all n. With this notation in hand, we can also now define the staffing problem that is at the heart of this paper. For some > 0, we seek to find a c > 1 such that the probability that the queue length exceeds or equals the number of servers is at most , i.e.

We will refer to P (Q C t (n) ≥ cn) as the exceedance probability. This can also be thought of as the probability that all servers are busy at time t. The solitary arrival analog of the exceedance probability, say P (Q C t (1) ≥ c), is often referred to as the "delay probability" in the case of stationary Poisson process arrivals, since the famous PASTA theorem implies that this is also the probability that an arriving customer will have to wait for service (Wolff 1982) . Even setting the lack of a Poisson assumption aside, we can observe that the event {Q C t (n) ≥ cn} does not offer such guarantees in the case of batch arrivals. Instead, this would correspond to the event that all customers within an arriving batch would have to wait. Hence, one could instead consider other performance metrics that require a higher standard of service, such as the probability that some customers wait, P (Q C t (n) + B(n) ≥ cn), or more generally, P (Q C t (n) + pB(n) ≥ cn) for some p ∈ (0, 1). We will stick to the exceedance probability as defined in (1), as even the weakest of these requirements will be enough to create the strong staffing requirements to which we have alluded, but much of our analysis can be carried through directly for similar events.

At various points we have claimed that our staffing analysis will also apply to rapid bursts of arrivals, rather than just to batches of arrivals. Intuitively, if a burst or cluster of arrivals is very nearly simultaneous, then this should be quite similar to a truly simultaneous batch of arrivals.

Here, we will introduce a pair of stylized models and make brief arguments in favor of this.

Consider two parsimonious Markov models of bursty arrivals; one exogenously driven and one endogenously driven, each ephemeral. For the exogenous case, suppose that at time 0 an external event occurs which spurs arrivals downstream according to a Poisson process with rate α Exo > 0.

However, at some independently and exponentially distributed expiration time, the external event concludes and the Poisson arrival stream ceases, ending the burst. Suppose that this expiration time has mean 1/β Exo > 0. Let τ Exo ≥ 0 be the duration, or the time of the last arrival, of the exogenously driven burst, and let Z Exo ≥ 0 be the size, or the number of arrivals that occur, of this burst. Now, for the endogenously driven model, we will mirror the exogenous model in a self-exciting fashion. 6 Let an initial arrival occur at time 0. Suppose this arrival generates downstream arrivals according to a Poisson process at rate α Endo > 0, and suppose further that all future arrivals do as well, with all Poisson streams being mutually independent. Each of these streams cease after an independently and exponentially distributed time has past since the given stream's initial arrival, and we will suppose that the rate of these exponential random variables is β Endo > α Endo . Like in the exogenous model setting, let τ Endo be the duration of the burst and Z Endo its size.

Proposition 1. Suppose that α Exo , β Exo → ∞ and α Endo , β Endo → ∞ with α Exo /β Exo and α Endo /β Endo fixed. Then, the bursts become instantaneous, i.e.

while the distributions of Z Exo and Z Endo are unchanged.

These models are both simple, but the underlying principle should hold merit in greater generality. In each of these continuous Markov chain models, the α and β rates can be thought of as the underlying burst events occurring on smaller and smaller timescales. Relative to the timescale of staffing decisions, we think this is quite natural. That is, bursts may take place on the order of seconds or minutes, whereas staffing decisions typically last for several hours. Matching that reasoning, Proposition 1 suggests that we can treat such a burst as a batch, as the size distributions do not change throughout the scaling. Of course, in applications like autonomous vehicle remote assistance, the arrival stream need not be exclusively bursts nor exclusively batches, and may actually be bursts of batches. Again, in such cases, Proposition 1 suggests we can think of this as batches of batches, which are simply batches with a different size distribution. Hence, we will stick to strictly studying batch arrivals.

To rigorously determine how to staff queues with large batch arrivals, we must first simply understand the behavior of the queue in this setting. From the general batch arrival queueing models defined in 3, we have a natural sequence of systems indexed by the relative batch size n. In the multi-server model, Q C t (n), both the batch sizes, {B i (n) | i ∈ Z + }, and the number of servers, cn, depend on n, whereas in the infinite server model, Q ∞ t (n), only the batch sizes do. Because we want to reason about the system as the batch size grows large, let us suppose that there exists a sequence of positive independent and identically distributed random variables

Applying the limit indexing to the Kendall notation, we are considering a sequence of G B(n) t /GI/cn systems. As this notation implies, this staffing level is inherently QD in style, as the number of servers, cn, is exactly proportional effective arrival

The staffing results we find for this large batch setting will follow as an immediate consequence from a connection we prove between batch arrival queues and storage processes, the continuous time stochastic processes we have surveyed in Section 2.

To build intuition for how the batch arrival queue behaves as the relative batch size increases, let us start by decomposing the infinite server queue length. Because the sheer abundance of servers implies that no customer will wait for service, any customer in the system at time t is in service.

In other words, the queue length at time t is the collection of customers that arrived before t but complete service after t. Summing over the arrival epochs and batches so far, this equivalence yields the equation

Here, the queueing dynamics are plain: Q ∞ t (n) will jump up by the amount of the i th batch size B i (n) upon the i th arrival epoch, and it will then jump down a unit size upon each service completion. As the relative batch size increases, the size of the up-jumps will dwarf the size of the down-jumps, but the down-jumps will also become increasingly frequent. Following that intuition, let us define an alternate stochastic process, ψ ∞ t , in Equation (4),

where ψ ∞ 0 is a known initial condition. The idea here is similar to (3), with jumps upward at epochs given by a distributionally equivalent point process N t (hence the duplicated notation), but this is the only source of stochasticity in the model. That is, otherwise, the behavior is deterministic.

Thus, ψ ∞ t is a shot noise process, with the behavior between arrival epochs determined by the tail CDF of the queue's service distribution,Ḡ(·). This suggests the connection between the two models.

Hence, the law of large numbers links the infinite server queue and the shot noise process: if normalized by the batch size, the sum over the distributionally identical indicator functions should converge to the tail CDF as n → ∞. To formalize this reasoning, let us assume that the known initial conditions converge, i.e. Q ∞ 0 (n)/n → ψ ∞ 0 as n → ∞. Then, in Theorem 1 we prove that under this large batch scaling regime, the general batch arrival infinite server queue converges to a general shot noise process.

Theorem 1. As n → ∞, the batch scaling of the G

is a shot noise process with the i th jump having size M i for each

i ∈ Z + , as defined in Equation (4).

Conceptually, this large batch scaling bears similarity to a fluid scaling, as we are "shrinking" customers while also increasing the inflow of customers. However, by comparison to a traditional fluid limit, this increase in inflow is through a greater amount of simultaneous arrivals rather than a faster rate of solitary arrivals. Hence, the randomness of the arrival epoch point process is preserved, and this is what yields the random limit. If Q ∞ t (n) answers the question "how many customers are in the system at time t?" then after normalizing by n, Q ∞ t (n)/n must answer "how many batches of customers are in the system at time t?" As the relative batch size grows large, Theorem 1 implies that the latter of these questions is effectively answered by ψ ∞ t as well. Because ψ ∞ t is deterministic between arrival epochs, it offers a more amenable platform for analysis of the queue with large batch arrivals. In fact, ifḠ(·) is continuous then ψ ∞ t will be continuous as well. We will leverage these concepts in the next subsection for our batch arrival queue staffing methodology.

Before doing so, let us first briefly provide a little more intuition about the shot noise process ψ ∞ t . If the arrival epochs follow a time homogeneous Poisson process, then we can leverage conditional uniformity and Equation (4) to provide a closed form expression for the limiting generating function.

Corollary 1. If N t is a stationary Poisson process with rate λ > 0, the moment generating

for each t ≥ 0 as n → ∞.

Our assumption here that the arrival process is Poisson is only temporary, but it gives us valuable insight into the properties of the limiting distribution. In particular, the nested exponential form of the moment generating function implies that the distribution of the shot noise process is not sub-Gaussian, meaning that the trail is greater than the tail of the normal. This fact will be important throughout our staffing analysis since the exceedance probability is inherently a tail probability event, and it will further gain relevance in contrast with the asymptotic normality that we discuss in Section 5.

Having gained intuition from the connection of the infinite server batch arrival queue and the general shot noise process, let us now turn to the multi-server model that at the heart of our staffing problem. Setting aside the initial customers in the system for the moment for the sake of space on the page, in Equation (7) we can see that we can also decompose the multi-server queue length into a sum over which customers are still in the system. However, by comparison to the infinite server model, the customers present in the multi-server system at time t are not only the customers actively being served, but also the ones who are waiting to receive service. Hence, we must correct the sum over prior arrivals to also include the effect of waiting. This yields

and if we were to include the customers present at time 0, we would simply mimic both summations for that population of customers. Inspecting (7), we can recognize a familiar form. The first summation is identical to what we have seen in the infinite server queue length decomposition, and thus by Theorem 1 we know that as the relative batch size increases this should resemble the shot noise process if properly normalized. Hence, we turn our attention to the waiting time correction sum.

To start, let us reason about when waiting should occur, since W i,j > 0 is necessary for these indicator functions to ever be equal to 1 for some value of t. By definition, a customer will wait in the multi-server queueing model when the number of present customers is greater than the number of servers, and the delay of their start of service will be longer when the number of excess customers is higher. If the number of customers is no more than the number of servers, then the multi-server queue will behave just like the infinite server queue and there will be no waiting.

However, whenever the queue length exceeds the staffing level, the service will be bottlenecked by the number of servers. Following this intuition, let us introduce a storage process model, ψ C t , that modifies the shot noise process in an analogous fashion. If the storage process is below a capacity level c, it should behave just like the shot noise process, but when the storage process exceeds the capacity its behavior should be limited by that level. This leads us to

Shot noise process without capacity

Like the relationship between Q ∞ t (n) and ψ ∞ t , we can see that ψ C t mimics the behavior of Q C t (n): up-jumps at the arrival epochs, and then decreases between, with the rate of decrease being limited by the staffing or capacity level. Furthermore, just like the shot noise process ψ ∞ t , here we can see that ψ C t has deterministic behavior between arrival epochs, capturing the relative lack of variability seen at a large scale. With an analogous initial condition limit assumption as before, Q C 0 (n)/n → ψ C 0 as n → ∞, this leads us to our first main result, the general batch scaling limit of the multi-server queue in Theorem 2.

Theorem 2. As n → ∞, the batch scaling of the G

pointwise in t ≥ 0, where ψ C t is a generalized storage process as defined in Equation (8).

Just like we remarked for the comparison of Q ∞ t (n) and ψ ∞ t , Q C t (n) answers the question "how many customers are in the system?" while ψ C t answers "how many batches are in the system at time t?" We can also reframe the staffing problem in the same manner. Rather than searching for a staffing level cn that delivers a sufficiently low probability that the number of customers will exceed the number of servers, Theorem 2 allows us to instead seek a level c such that the probability that the number of batches will not exceed it is sufficiently low. This shows us that, in the presence of large batches, the staffing level must be directly proportional to the batch size.

That is, by direct consequence of the convergence of the batch arrival queue to the storage process, itself a stochastic model, the queue's exceedance probability at staffing level cn converges to a non-degenerate probability.

for each t ≥ 0 such that the arrival epoch process N t satisfies

meaning that arrival epoch process does not render the storage process exceedance probability trivially degenerate.

Plainly, if the relative batch size doubles, Theorem 2 and Corollary 2 show that the queue's staffing level should precisely double as well. This shows the lack of an economy of scale: unlike when an arrival rate grows large, there is a not a labor savings benefit as the batch size grows large.

To this same end, this shows the strong demands of simultaneous arrivals to service systems. The staffing in this system is directly proportional to the effective arrival rate, like in the QD regime, but Theorem 2 shows that the limit is still random and Corollary 2 emphasizes that this high level of staffing still yields a non-degenerate exceedance probability, like in the QED regime. We will further explore the comparison of large batches and large arrival rates in Section 5.

Let us note that the arrival epoch process condition in (11) is hardly restrictive. For example, it is immediately satisfied for N t as a (non-stationary) Poisson process with nonzero rate. Still, the storage process may be somewhat opaque as defined in (8). To provide some intuition about this stochastic process, let us consider the case of exponentially distributed service. For the sake of this example, suppose thatḠ(x) = e −µx for some µ > 0. Then, Equation (8) yields that Multiplying and dividing by e −µt inside the integral, we can re-express this as

and by changing the variable of integration to be s instead of t − s, we furthermore have

Since we know that the process jumps by M i at the i th arrival, let us take t ∈ (A i , A i+1 ) and focus on the behavior between jumps. Because storage processes are deterministic on inter-jump intervals, we can take the derivative with respect to time and observe that for t ∈ (A i , A i+1 ),

Hence, in the case of exponential service the inter-jump dynamics of this process can be easily summarized. If ψ C t is above the threshold c it drains linearly, if it is below c it decays exponentially. This precisely matches what we would expected from a G t /M/c queue: departures at a rate proportional to the minimum of the number in system and the number of servers. As an example of the limiting threshold dynamics, in Figure 2 we plot a simulated scaled queue length sample path along with the calculated storage process values when given the same arrival epochs.

Let us note that our focus in this section has been to establish the key managerial takeaway showing the strong demand that batch arrivals place on service systems, as captured in Theorem 2's insight that the QD-style staffing will yield QED-style performance. This shows the absence of an economy of scale. Furthermore, Corollary 2 implies that the staffing problem for the batch arrival queue can be directly translated to a staffing problem for the storage process limit. While we have not explicitly said how to staff a storage process here, this is the focus of Appendix D, in which we build upon results available in the storage process literature. In what follows of the main body of the paper, we are interested in finding a second key managerial insight, specifically for the case in which customers arrive both en masse and quite frequently.

Following Section 4's consequences for queues with large batch arrivals, it is natural to wonder what interplay exists between large batches and large arrival rates. This section's results will reveal a spectrum between the large batch limit in Theorem 2 and the classical QED heavy traffic limit, originating in Halfin and Whitt (1981) . To identify the interior between these extremes in sufficient clarity, we will now focus our attention on the M n /M/c queue in steady-state. That is, we will suppose the batches are all of a fixed size and that they occur at arrival epochs according to a homogeneous Poisson process. The limiting regimes we now consider here will be what we will refer to as a hybrid scaling, in which the effective arrival rate grows large through both the batch size and the batch arrival rate.

Since we are now considering a more specific setting, let us define updated notation. First, let us introduce m as the relative effective arrival rate; this will be the index for our hybrid limits.

Following the steady-state assumption, we will drop the t subscripts and let Q ∞ (m) and Q C (m) be the infinite and multi-server queue lengths, respectively. Perhaps the most important parameter of this scaling will be ν ∈ [0, 1], which dictates the relative weight of the batch size and the batch arrival rate within the effective arrival rate. That is, we will suppose that all batches are of size nm ν for some constant n > 0, and likewise we will let the batch arrival rate be λm 1−ν for some λ > 0. Hence, the product of the batch size and the batch arrival rate, λnm, is the effective arrival rate of customers to the service system. By convention, we will assume that nm ν is an integer to avoid cumbersome expressions, but this analysis can be carried through with a rounded quantity as the batch size instead, such as nm ν . If ν = 1, this scaling reduces to the large batch setting in Section 4, and if ν = 0, we will recover the classical QED regime.

To begin building intuition on the interplay of the arrival rate and the batch size in both queueing models, let us first review some properties of the infinite server queue. In particular, since the classical QD and QED regimes are closely related to law of large numbers and central limit theorem type results, the mean and variance of the queue length hold particular relevance. Hence, in Proposition 2 we provide the mean and variance of the steady-state queue length for this hybrid scaling parameter setting.

Proposition 2. In an M n /M/∞ queue with arrival rate λm 1−ν and batch size nm ν , the mean steady-state queue length is given by

and the steady-state variance is equal to

where ν ∈ [0, 1].

Look at how these quantities relate for different values of ν, or specifically, how the mean and standard deviation compare as ν changes. Starting with the classical, at ν = 0 the standard deviation is of order √ m, whereas the mean is of order m. On the other hand, at ν = 1 for the pure large batch scaling setting, the mean and standard deviation are both of order m. This immediately adds context to our prior results obtained in Section 4, and we will continue to explore that special case in this section. On the interior, for ν ∈ (0, 1), the standard deviation is then of order m (1+ν)/2 , making it not quite on the same level as the mean but greater than the square root of it. Hence, as long as ν < 1, the order of the mean dominates the order of the standard deviation, and through this we can find different limits for the two different orders of normalization.

Theorem 3. Let ν ∈ [0, 1). As m → ∞ in the M n /M/∞ queue with arrival rate λm 1−ν and batch size nm ν , the steady-state queue length converges to a constant when normalized by m:

However, when centered by its mean and normalized by m 1+ν 2 1 + 1 nm ν , the steady-state queue length converges to

as m → ∞, where X ∼ Norm(0, λn 2 /2µ).

Let us make the contrast clear: if ν = 1, there is only one limit, and this is given by Theorem 1.

Furthermore, as made plain by Corollary 1 and the surrounding remarks, this limit is random but it is not asymptotically normal. Rather, the limiting distribution dominates Gaussian or sub-Gaussian distributions. As a demonstration of this, we compare the density function of a standard normal to the simulated histograms of the centered and normalized infinite server queue under a range of hybrid scaling settings in Figure 3 . As we can see, the asymptotic normality of the hybrid scaling is overwhelming when the arrival rate is near the same or higher order than the batch size;

it is close even when the batch size is 10,000 and the arrival rate is 10. But, when the limit is fully on the batch size, the distribution of the scaled and centered queue is clearly not normal.

Because the leading order of the standard deviation is m (1+ν)/2 , one can think of the normalization as being of this order (without the multiplied square root term) for simplicity, if preferred. We will make use of this condensed presentation in our subsequent staffing analysis in the next subsection.

Following Theorem 3, we see how this hybrid large batch and large arrival rate limit recovers a

Goldilocks takeaway like what QED offers, albeit still at a higher order. If ν < 1 in the hybrid regime, scaling by m is too much, scaling by √ m is too little, but scaling by m (1+ν)/2 is just right.

However, if ν = 1, there is no choice to be made; there is only one scaling and it is of order m.

Now, to properly connect these results to staffing decisions and relate this spectrum of regimes to QED, we must move beyond the infinite server queue to the multi-server. So, for the second of our two main results, we will apply this hybrid scaling limit to the case of finitely many servers. We will still set the arrival rate and batch size as λm 1−ν and nm ν for the limit indexed by m with spectrum parameter ν, and because of Theorem 3 we will set the staffing level as λnm/µ + δm (1+ν)/2 for some δ > 0. 7 In this case, we prove a steady-state exceedance probability limit fashioned in the style of Proposition 1 of Halfin and Whitt (1981) with higher order safety staffing.

Theorem 4. Let ν ∈ (0, 1). As m → ∞ in the M n /M/c queue with arrival rate λm 1−ν , batch size nm ν , and staffing level λnm/µ + δm (1+ν)/2 for some δ > 0, the steady-state exceedance probability converges to

where Φ(·) is the cumulative distribution function of a standard normal random variable.

The asymptotic normality on the interior of the spectrum makes the limit more closely related to QED staffing than may have been obvious. In fact, the expression for the limiting value in the right hand side of Equation (16) exactly matches the right hand side of Proposition 1 of Halfin and Whitt (1981) for a Halfin-Whitt parameter β = δ/n 2µ/λ. Hence, arrivals in large batches and at large rates recover precise QED performance as these batch sizes and arrival rates grow large simultaneously, but to properly account for the batches the order of the safety staffing must be higher than what QED prescribes. As we alluded to in Section 2, this both matches and justifies the peakedness approximations such as in Whitt (1992), while also providing a cautionary tale that service managers should remain wary that peakedness parameters may not be fixed and may increase in tandem with the effective arrival rate.

For these reasons, we say that an economy of scale does exists in the presence of large batches and large arrival rates, but that it is weaker than what is typically expected when only the arrival rate is large. That is, if the effective arrival rate λnm doubles through m doubling, the resulting staffing will be less than double the prior level so long as ν < 1. In the case that ν = 1, we revert to the pure large batch setting in Section 4, in which there is no economy of scale and the new staffing is precisely double the level before.

Closing this section with a numerical demonstration, in Figure 4 we provide an analog of the simulation experiments from Figure 3 for the multi-server setting. The effective arrival rate is held constant at m = 100, 000, while ν ∈ {0, 1/5, 2/5, 3/5, 4/5, 1}. The staffing level is then set according to Theorem 4, and the simulated empirical exceedance probability is compared with the asymptotically normal case and the storage process limit case. We can see here that for ν ∈ {0, 1/ 5, 2/5, 3/5} the hybrid limit is clearly manifesting and that the simulated queue lengths appear to be converging from above. As ν increases, though, the rate of convergence appears to slow, with ν = 4/5 not having large enough m for the hybrid limit to fully take effect. However, it is still less than the storage process value seen for the pure batch scaling limit at ν = 1. Even in the cases in which the simulated exceedance probability is close to the expression from Theorem 4, it is important to watch the magnitude by which the staffing level is growing. Although the effective arrival rate is held fixed, the safety staffing rises as the weight of the arrivals shifts onto the batch size. Comparing the extremes, the same effective arrival rate of traffic at ν = 1 requires nearly double the staffing that is required at ν = 0, the QED case, while actually still falling short of the exceedance probability seen at ν = 0. Even comparing very mild batches (n = 10 and λ = 10, 000 at ν = 1/5) to the QED case (n = 1 and λ = 100, 000 at ν = 0), we see that nearly an extra 700 workers are needed to achieve the exact same performance. In many ways, this brings the conversation around Figure 1 to a full circle. In Figure 1 , we saw how poorly batch arrival systems could perform when staffed as if they were solitary arrivals at the same effective rate. Figure 4 we have demonstrated how to adjust the staffing in preparation for batches, as prescribed by our two main results, Theorems 2 and 4.

In this paper, we have found that service systems with large batches or rapid bursts of arrival face serious operational challenges. Our results show that such arrival patterns demand a high level of staffing, so much so that there may not even be an economy of scale as the effective arrival rate of customers grows. Specifically, in our two main results, we have seen that there is truly no economy of scale in the pure large batch limit (Theorem 2), and in the case of large batches and fast arrival rates, an economy of scale may occur but it will be weaker than what is typically expected (Theorem 4). Comparing to the classic literature on staffing service operations, this means that for large batches it takes the high commitment of QD-style resources just to achieve what is typically a balanced compromise of performance, as classically offered by the QED regime. In the hybrid case, it remains true that QED-type performance will take more than typical QED-style staffing, but not quite as much as the QD regime. Under the hood, these large batch insights are powered by a connection between batch arrival queues and storage processes. Our hybrid limits reveal further that, at the large batch extreme of this spectrum, the typically distinct law of large numbers (or, fluid) style and central limit theorem (diffusion) limits are actually the same in the case of the large batch scaling. Here, the batch arrival queue's mean and standard deviation are of the same order, and thus there is no asymptotic normality, only the storage process limit.

We have studied this batch arrival staffing problem after being inspired by many different applications, so let us discuss these further with our results now in hand. Beginning with our original motivation, autonomous vehicle teleoperations or remote assistance, our insights are critical for evaluating whether these support techniques are viable at scale. In particular, if the batch size, meaning the number of support tasks per request or incident, grows with the number of vehicles, then our large batch limits unfortunately imply that remote assistance will be proportionally no more economically feasible than having in-car safety drivers as the fleet grows, essentially reducing to the current state of taxis and ride share. However, if the number of tasks does not grow with the fleet size, or if both the request rate and the number of tasks grow, the hybrid limits suggest that there will be an economy of scale and that remote assistance may very well be viable. Hence, as this technology develops and begins to scale, the future of autonomous vehicle remote assistance may very well hinge on how the number of simultaneous jobs and the rate of requests each grow as the customer base and vehicle fleet grow. In this application and others like the quarantine beds, batch computing methods, classical transportation scenarios, the full breadth of our results demonstrate in general the necessity of additional preparation when the customer traffic mix includes batches or rapid bursts. In particular, the safety staffing levels need to be adjusted to account for this simultaneity in arrivals, and in this work we have rigorously shown how much adjustment is needed.

We can also recognize some limitations of our models and analysis for these applications, and many of these limitations should actually make for interesting and important future research problems. For example, the autonomous vehicle remote assistance and parallel data processing examples may call for intentional redundancy of tasks within a batch, and this would naturally create dependence among service durations. Informally, we can posit that our results should likely carry through directly if the dependence between jobs is weak, since law of large numbers results can still be achieved in such cases. We conduct an initial exploration into the consequences of dependence within batches through a series of simulation experiments in Appendix F. Furthermore, many of our motivating applications may call for priority rules in systems with multiple types of jobs and servers with many different skills. We think this is a highly interesting and practical direction of subsequent research, and we suspect that there are likely some intriguing families of stochastic models that would arise in the large batch limits of such settings.

We also believe that there are many other important directions for future work of the models and limits which we have studied here, ranging from applied to analytical. For an example of the latter, there is likely a richness of opportunity around the hybrid limits. Section 5 intentionally considers the simple Markovian steady-state setting to demonstrate the spectrum of limits in full clarity, but it is likely that this can be advanced in intriguing technical directions, such as achieving process level convergence. We are quite interested in such questions, although we consider them out of scope for the present focus on staffing managerial insights. There are also likely many interesting connections to be made between these insights and models closely related to batch arrival queues, such as the fork-join queues we have discussed in Section 2 or to systems that require simultaneous starts of service, like in Hong and Wang (2021) . We have already remarked that these types of models should require at least as much staffing as a batch arrival queue because of the de-centralized waiting, and it would be interesting to analyze this further. On the application side, staffing problems for humans-in-the-loop systems like the autonomous vehicle remote assistance methods have an important place in future-of-work discussions, particularly in light of trends following Covid-19.

For example, there were examples early in the pandemic of autonomous vehicles being teleoperated in medical settings for the sake of the safety of the drivers (Ford 2020). One can imagine that with a surge in desire for work-from-home jobs, the demand for such roles may persist even beyond the present benefits to public health. In such settings, both the insights and the methodology that we have developed in this paper should be of use in designing, analyzing, and managing these systems.

Previous drafts of this paper focused exclusively on the autonomous vehicle teleoperations context;

we are grateful to anonymous referees who have helped us recognize the generality and applicability of our results. (2018) The preceding degree of capability, Level 4, is the corresponding level for what may be achievable with occasional human help when outside understood and controlled environments.

2. Hampshire et al. (2020) does consider a staffing problem, but given the broad intended audience the analysis stuck to traditional tools for solitary arrivals rather than batch; despite describing the remote assistance methodology as requiring "several human supervisors" for each request.

3. This single-centralized-queue versus multiple-separated-queues comparison hints at some of the important differences between the batch arrival queueing model and the fork-join queueing model. We discuss the differences between these models in more depth in Section 3.

4. In some early works that classify the regimes, such as Borst et al. (2004) , Garnett et al. (2002) , QED staffing is called the "rationalized" regime, referring to its balance between quality and efficiency.

5. Eckberg (1983) appears to be the foundational reference for peakedness approximations, but we have had trouble locating this manuscript.

6. This is modeled after the structure of the ephemerally self-exciting process (ESEP) studied in Daw and Pender (2022) .

7. We could sharpen the convergence (to the same result) by having the staffing level instead as λnm/µ + δm (1+ν)/2 1 + 1/(nm ν ), but we stick with a safety staffing of δm (1+ν)/2 for a simplified and clear presentation. 

This appendix contains proofs of our main results, as well as supporting and auxiliary results that expand the story of this work. Let us outline those here. Appendix A contains the proofs of the large batch limits contained in Section 4, and likewise Appendix B contains the proofs of the hybrid limits from Sections 5.

Appendix C then contains proofs of supporting and related results to these limits, as well as some additional

results not yet shown in the paper. As mentioned in the main body of the text, Appendix D contains results and techniques for staffing the large batch arrival queue by way of staffing the storage process, and Appendix E houses technical results supporting this. Finally, Appendix F contains simulation experiments and discussion on the impacts of dependence between jobs within the same batch.

A.1. Proof of Theorem 1

Proof. We will show the convergence of the batch scaling of the queue through analyzing its moment generating function. To begin, we note that the infinite server queue length can be expressed in terms of indicator functions as

where S i,j is the service duration of the j th customer within the i th batch and S 0,j is the remaining service time of the j th job that was in service at time 0. In this way, the first term on the right hand side represents the number of jobs in the system at time 0 that remain in the system at time t, whereas the double summation counts the number of jobs from each batch that remain in service at time t. Because there are infinitely many servers, we can note that the number jobs remaining since time 0 is independent from the number of jobs in the system that entered after time 0. Hence, we will consider these groups separately. Starting with those jobs initially present, we can note that since {S 0,j | 1 ≤ j ≤ Q 0 (n)} are the only stochastic terms, the law of large numbers yields that

Thus, without loss of generality, we will hereforward assume that the queue starts empty. We then write the moment generating function of Q t (n) at θ n as

By conditioning on the filtration of the counting process F N t , total expectation yields that

Focusing on the inner expectation, we again use the tower property. We now condition on the batch size B i (n), which leaves the service duration as the only uncertain quantity. The indicator is thus a Bernoulli random variable with success probabilityḠ(t − A i ), and since these are i.i.d. within the batch we have that

By now using the identity x = e log(x) , we can transform this to

which we can now re-express further through two series expansions. Specifically, using a Taylor and a Mercator series expansion on e θ n − 1 and log 1 +Ḡ(t − A i )(e θ n − 1) , respectively, we simplify to

Returning to the original expectation, we now have that

, and as n → ∞, this converges to

which yields the stated result for the queue.

Proof. In a manner similar to the proof of the infinite server to shot noise convergence in Theorem 1, we begin by decomposing the queue length process into a sum of indicators. By comparison to the infinite server decomposition however, these indicators depend not only on the batch arrival epochs and the individual service durations, but also on the lengths of time that jobs wait to begin service while the servers were occupied. Recalling that W i,j is the total time the j th job within the i th batch spends waiting, we can express the queue length in the delay model queue at time t as

One can interpret this decompositions as follows. The first double summation across arrival epochs and batch sizes gives an idealized infinite server representation that would be accurate if no jobs had to wait to begin service. The second double summation then corrects that under-counting for any jobs that had to wait and

have not yet completed service at time t. The third and fourth terms then capture the initial state of the system, with the third term counting which jobs have remained in service from time 0 to time t and with the fourth term counting the number of jobs that were waiting at time 0 and have not completed service by t. Here we use S 0,j to represent the remaining service times of the jobs that are in service at time 0 and we use W ·,j and S ·,j to represent the waiting and service times for the jobs that are present in the system at time 0 but were not in service. In this notation, the residual service time S 0,j need not be equivalent in distribution to S i,j for i ∈ Z + , whereas S ·,j is equivalent to S i,j .

To begin moving towards the storage process limit, we first show a batch-arrival-queue analog of Proposition 2.1 from Reed (2009) . That is, we seek to justify

and this follows from a generalization of the arguments from Reed (2009). Starting with the summations over the tail CDF terms, one can re-express these in terms of integrals over the service distribution measure, and these integrals can then be adjusted to a standard interval of [0, t] through the introduction of indicator functions:

Then, one can recognize that the number of jobs waiting at an arbitrary time u ≥ 0 can be written

as the first term on the right-hand side captures the number jobs still waiting across each batch of arrivals and the second term captures the number of jobs that have been waiting since time 0. Thus, by exchanging the order of summation and integration, we can now observe that

and thus we achieve Equation 18.

Returning now to the decomposition of the queue length in Equation 17, we can use the equality from Equation 18 to re-express the queue length as

Through this decomposition, we will now prove that the batch scaling of the queue length converges to the generalized storage process. We proceed through induction on the arrival times, where one can suppose that we have conditioned on the filtration of the arrival process up to time t, like in the Proof of Theorem 1. For the base case, let 0 ≤ t < A 1 . Then, the normalized queue length at time t can be written

which we now analyze piece by piece. By the law of large numbers, the assumption on the initial values, and the continuous mapping theorem, we have that as n → ∞ 1 n

whereḠ 0 (·) is the complementary CDF of the residual service durations of the initial jobs in service at time 0. We can also similarly observe that

as n → ∞. Now, for the summation over jobs that were waiting to begin service at time 0, we can employ a martingale argument such as that used in e.g. Andrews (1988) . Let S j for 0 ≤ j ≤ (Q C 0 (n) − cn) + be the filtration generated by the collection of service times of the jobs initially in service at time 0 and of the first j jobs to enter service after time 0, i.e. S j = σ ({S 0,1 , . . . , S 0,cn , S ·,1 , . . . , S ·,j }). Then, one can note that for j < (Q C 0 (n) − cn) + , W ·,j+1 is S j measurable, as the previous service durations dictate the time that this job waits. Thus, we can recognize that E [1{t − W ·,j < S ·,j } | S j ] =Ḡ(t − W ·,j ). This implies that the summation is a martingale difference sequence, and thus we have that 1 n

as n → ∞. Thus, as n → ∞ the queue length on 0 ≤ t < A 1 converges to a process z(·) satisfying

We can observe that on this time interval each of these terms are deterministic, and thus Proposition 3.1 of Reed (2009) yields that the function z(·) that solves this equation is unique. Since this matches the expression for ψ C t on 0 ≤ t < A i as given by Equation 8, we have that

At the precise epoch of the first arrival, we can note that we furthermore have the convergence of the process immediately after the batch of jobs arrives, which is a direct consequence of the preceding arguments and assumption that B 1 (n) n D =⇒ M 1 as n → ∞. Thus,

, satisfying the base case of our inductive argument.

For the inductive step, we now assume that Q C s (n) n D =⇒ ψ C s as n → ∞ for s such that 0 ≤ s ≤ A i and some i ∈ Z + . Let us now take t such that A i ≤ t < A i+1 . We have established that we can decompose the normalized queue length as

and we can again analyze this piece-by-piece. By the batch scaling convergence of infinite server queues to shot noise processes in Theorem 1, we can observe that as n → ∞

Similarly, analogous arguments to the base case show that the initial condition terms are such that

and 1 n

For the remaining double summation over arrival epochs and batch sizes, we can again make use of a martingale structure. For i ∈ Z + and j ∈ Z + , let us define the sigma algebra generated by the arrival times and service times of all jobs up to and including to the j th job within the i th batch, which is

Then, we have that W i,j+1 is S i,j measurable since the queue is operating under first-come-first-serve, meaning that only the previous jobs determine how long the j th job in the i th batch waits. Thus,

. Therefore through martingale differences we have

as n → ∞. Bringing these pieces together we now have that for t ∈ [A i , A i+1 ) the queue length process converges to a process z i (·) satisfying

From the inductive hypothesis and the uniqueness given by Proposition 3.1 of Reed (2009), we have that z i (s) = ψ C s must hold for all s ≤ A i . One can then observe that z i (t) is deterministic for A i < t < A i+1 , meaning that Proposition 3.1 of Reed (2009) further implies that z i (t) = ψ C t on this interval as well. To complete the inductive step, we can note that by the given convergence of the batch sizes to the jump sizes, we also have that Q C A i+1 (n)/n D =⇒ ψ C A i+1 as n → ∞, and this completes the proof.

B.1. Proof of Theorem 3

Proof. As we did in the proof of Proposition 2, here we will invoke Proposition 2.4 of Daw and Pender (2019) . We know that the queue length is equivalent in distribution to a sum of scaled Poisson random variables, i.e.

where Y j ∼ Pois (λm 1−ν /jµ) are independent. Starting with the scaling with normalization by m, we consider Beginning with the former, the sum of scaled Poissons decomposition allows us to invoke a Chernoff inequality to bound the probability of this event. That is,

Through use of the Poisson moment generation function and Taylor expansions on e jθ , we can see that the product is equal to

λm 1−ν jµ (e jθ −1) = e nm ν j=1

we can bound the term in the exponent by

Hence, we can now see that this event's probability is bounded by

since at each θ > 0 this initial Chernoff inequality expression is bounded above by the exponential function that we have just now reached. Conveniently, we can identify the value of this infimum since simple derivative checks show that the function in the exponent is minimized at

We now have

Now, we can observe that the logarithm of the expression inside the widest parenthesis is negative, i.e. log 1 + µ λn −( n + λ µ ) e n = λ µ µ λn − 1 + µ λn log 1 + µ λn < 0, since z − (1 + z) log(1 + z) < 0 for all z > 0. Hence, because ∞ k=1 x k p < ∞ for x ∈ [0, 1) and p ∈ (0, 1], 8 we can see that

so we now turn our attention to the latter of the two event sequences indexed by m.

Since Q(m) is non-negative, here we restrict to < λn µ . Again through a Chernoff inequality approach on the sum of scaled Poisson's decomposition, we see that

λm 1−ν jµ (e −θj −1) . Now, we can observe that because e −x − 1 ≤ (e −kx − 1)/k for all k ≥ 1 and all x ≥ 0, we can bound the function in the rightmost exponent via

Again, this yields an object that is more amenable to minimizing in close form. Here, direct calculus yields

The upper bound on the probability of the event is now

By once more taking the logarithm of the expression inside the widest parenthesis, we can see that the base of this exponent is less than 1:

which follows immediately from the fact that z(log(1 − z) − 1) − log(z) < 0 for all 0 < z < 1. Hence, again by the fact that ∞ k=1 x k p converges for x ∈ [0, 1) and p ∈ (0, 1], we have that

and thus by the Borel-Cantelli lemma, we have reached the stated result for scaling by m.

Turning now to the limit of the queue when centered by its mean and normalized by m 1+ν + m/n, we will approach this through the moment generating function provided by the sum of scaled Poisson's representation. This leads us to the following closed-form expression for the MGF:

To tackle the limit as m → ∞, let us focus on the terms in the exponent. By expanding the nested exponential function of θ according to a Taylor series, we can push the sum over j through and see that the first order terms will cancel with the mean centering term. Furthermore, the terms order three and above can all be seen to be no more than m −3(1−ν)/2 up to constants. Thus, we have

Multiplying the leading m 1−ν through, we can further simplify to

In this form, we can quickly see that if ν < 1 as m → ∞ the terms order three and above will vanish, leaving simply

as m → ∞ if ν ∈ [0, 1).

Proof. As motivated by the proof of Proposition 1 in Halfin and Whitt (1981) , let us first translate the exceedance probability into terms defined relative to the infinite server queue rather than the multiserver queue. For notational simplicity, let us temporarily repress the precise dependence on m and briefly consider the arrival rate, batch size, and staffing as simply λ, n, and c, respectively. Looking at Lemma 1 and comparing the transition dynamics for Q C and Q, one can see that these two Markov chains should have the set of same balance equations up until state c. That is, for π k = P (Q C = k) andπ k = P (Q = k) and all k ≤ c, both systems

are true. So, by multiplyingπ 0 Γ (λ/µ + n + 1)/ (Γ (λ/µ + 1) Γ (n + 1)) in both the numerator and denominator of the exceedance probability expression from Lemma 1, we can see that

where x + = max{x, 0} for all x ∈ R. Returning to the scaling regime at hand with arrival rate λm 1−ν , batch size nm ν , and staffing level λnm/µ + δm (1+ν)/2 , we can now attack the two distinct objects in this expression separately. Knocking out the simpler piece first, we can quickly observe that Theorem 3 implies that the first term in the denominator converges to

as m → ∞. Now, turning to the remaining piece of the exceedance probability expression,

for some C 1 ≥ 0 and > 0. Hence, we can instead consider the limit of

as m → ∞. Translating the expectation to an integral over the Gaussian density and converting this to a double integral, we see that To work towards identifying the limit, let us take a Taylor expansion of the exponential inside the double integral and evaluate the inner integral. This yields λ(nm

Let us now apply a binomial theorem expansion to (δ − zb m ) 2k+1 . Because the zero-order zb m term will cancel with the δ 2k+1 , we can see that we are left with the first order zb m term and orders two and above.

The first order terms can then be collected back into a Taylor expansion of an exponential function, leaving a remainder that features b m of at least quadratic order. which shows the convergence in probability. 9 For the distribution of the size of the burst, we can see that given the expiration time T Exo,η , the total number of arrivals follows a Poisson distribution with mean ηα Exo T Exo,η .

This Poisson-exponential mixture is known to yield a geometric distribution, which can be easily observed through manipulating the MGF:

Here we can observe that the probability parameter in the geometric distribution is β Exo /(α Exo + β Exo ), which does not depend on η. This completes the proof for the exogenous case, so we turn to the endogenous model.

Here we can invoke Propositions 3.3 and 3.5 of Daw and Pender (2022) to note that the mean duration of the endogenously driven burst is

while the distribution of the size of the burst is given by

for each z ∈ Z + . Each of these can be obtained by viewing the endogenous burst model as a random walk with absorption. The probability mass function of Z Endo already has no dependence on η, so we are only left to show the convergence of the duration. Again through a simple Markov inequality for any > 0, we can see that

hence the duration converges to 0 in probability.

Proof. These can both be seen as near immediate consequences of Proposition 2.4 of Daw and Pender (2019) , which provides that Q ∞ (m) D = nm ν j=1 jY j where Y j ∼ Pois(λm 1−ν /(jµ)) are independent. For the mean, this implies

Then, for the variance, we can leverage the independence of the Poisson random variables to similarly see

and so we complete the proof.

By convention, we will take a matrix product as evaluated with the largest index at the left-most position, followed by the second largest to its immediate right, and so on. Furthermore, we will take a matrix product over an empty set of indices to be the identity matrix.

Lemma 1. In the M n /M/c queue with arrival rate λ, batch size n, and staffing level c ≥ n, the steady-state exceedance probability is equal to

where v i for 1 ≤ i ≤ n is a n-dimensional unit column vector in the i th coordinate, C(x) in R n×n for x ∈ R is the companion matrix given by 

and s is a n-dimensional column vector such that

for each coordinate 1 ≤ i ≤ n.

Proof. Let us point to Neuts (1978) for origins of matrix analytic calculations for the distributions of batch arrival exponential service queueing systems; we provide this lemma here for simplicity and completeness of the paper, as we are not aware of this precise expression being available previously. Standard CTMC techniques yield that the multi-server steady-state probabilities π i = P (Q C = i) will satisfy the balance

where π k = 0 for all k < 0. Equivalently, π k = λ µ(k∧c) n k=1 π k−i . Therefore, any n-dimensional vector of consecutive steady-state probabilities will satisfy

Furthermore, for k ≤ n, this also implies that

Now, we can combine these facts and see that for any k ≥ n + 1,

So, we have that for any n + 1 ≤ k ≤ c,

while for k ≥ c, the arguments of the companion matrices cease to change, leaving

This gives us all we need to simplify to the stated expression. In particular, we can note that

and π 0 can be found through the fact that the full distribution must sum to 1. That is,

hence,

Finally, by recognizing that n k=0 n k=1 Γ λ µ + k Γ λ µ Γ (k + 1) = Γ λ µ + n + 1 Γ λ µ + 1 Γ (n + 1)

, we simplify to the stated expression.

Now that we have developed an understanding of queues with large batch arrivals through connections to storage processes, in this section we will leverage this insight and use the storage process models to staff the batch arrival queue. In Section 3, we defined the staffing problem as finding a staffing threshold c such that the desired exceedance probability, P (Q C t (n) ≥ cn), is smaller than some target > 0. We have also discussed how one could consider the probability of other, stricter events, like P (Q C t (n) + B i (n) ≥ cn). By normalizing these events by n, we can see that the batch scaling limit in Theorem 2 yields that

as n → ∞, allowing us to "staff" the storage process instead. One general approach to this problem would be to take a simulation-based approach, such as the well-known iterative staffing algorithm introduced in Feldman et al. (2008) . It is thus worth noting that the results of Theorems 1 and 2 have an immediate consequence of greatly simplifying the simulation of batch arrival queueing systems. For large batch sizes, one can simply simulate a storage process instead. This only requires generating random variables for the arrival epochs and jump sizes; one need not simulate service durations. In the large batch setting, this can deliver substantial savings in computation complexity, as large batches mean that a large number of service durations must be generated.

To draw upon results from the storage process literature and calculate explicit staffing levels, we will now assume that we are in the Markovian setting with N t as a Poisson process with rate λ > 0 and with exponential service at rate µ > 0. In this case, we are able to make use of a closed form expression for the moment generating function of the shot noise process, which is

Following standard stability assumptions for multi-server queueing models we will also suppose λE [B 1 (n)] < cnµ for all n ∈ Z + and we suppose that in the limit we have λE [M 1 ] < cµ as well. Thus, the objects we use to determine the staffing levels will be the storage and shot noise processes in steady-state. We denote these as ψ C and ψ ∞ , respectively. We now cite a result from the storage process literature providing integral equations for the steady-state densities of ψ ∞ and ψ C in Lemma 2.

Lemma 2. The steady-state density of the shot noise process f ∞ (·) exists and is given by the unique solution to the integral equation

for all x > 0. Furthermore, the steady-state density of the storage process f C (·) exists and is given by the unique solution to the integral equation

for all x > 0.

Proof. This follows directly from Theorem 5 of Brockwell et al. (1982) .

As an alternate representation of the integrals in Lemma 2, we can observe that in the case of the threshold storage process, for example, we have

since ∞ 0 P (M 1 > x − y) f C (y)dy = P (M 1 + ψ C > x) and P (M 1 > x − y) = 1 for all y > x. This expression will be of use to us in relating the two processes, further enabling us to use the shot noise process to understand the threshold storage process, just as we have used the infinite server queue to understand the multi-server queue. To begin, in Theorem 5 we will now use this alternate expression to justify our study of the stationary setting through a validation of the interchange of the limits of time and of the batch scaling.

Theorem 5. In the stationary Markovian infinite server and delay queueing models, the interchange of limits of time and batch scaling is justified. That is,

and

for all x > 0.

Proof. For the infinite server queueing model, this interchange can be quickly observed through differential equations for the moment generating functions of Q ∞ t (n) and ψ ∞ t . Let M n (θ, t) be the moment generating function of the scaled Markovian infinite server queue, i.e. M n (θ, t) = E e θ n Q ∞ t (n) . Then, M n (θ, t) satisfies n) . Then, we have that for any n ∈ Z + the moment generating function of the steady-state queue, say M n (θ, ∞), will be given by the solution to the time-equilibrium ordinary differential

As n → ∞, the limiting steady-state object will then satisfy

By comparison, the moment generating function of the shot noise process that yielded in the Markovian case of the batch scaling in Theorem 2, say M ψ (θ, t), will satisfy

which implies that in steady-state the shot noise process moment generating function, say M ψ (θ, ∞), is

given by the solution to

Hence, M ψ (θ, ∞) = M ∞ (θ, ∞), justifying Equation 26. To now prove Equation 27, we start with describing the balance equations for the queue. Letting π n i = lim t→∞ P (Q ∞ t (n) = i) for every i ∈ N, we have that these steady-state probabilities satisfy (λ + µ(i ∧ cn)) π n i = λ i j=1 P (B 1 (n) = j) π n i−j + µ(i + 1 ∧ cn)π n i+1 , for any n ∈ Z + . By induction, we can observe that this implies that the probabilities satisfy the recurrence relation

for all i ∈ Z + . At i = 1 this follows immediately from the global balance equation for π n 0 , so we proceed to the inductive step and assume that the hypothesis holds on i ∈ {1, . . . , k} for some k ∈ Z + . Then, through this assumption and the balance equation for π n k , we can observe that λπ n k + λ k j=1 P (B 1 (n) ≥ j) π n k−j = λ k j=1 P (B 1 (n) = j) π n k−j + µ(k + 1 ∧ cn)π n k+1 , and since P (B 1 (n) ≥ 1) this simplifies to µ(k + 1 ∧ cn)π n k+1 = λπ n k + λ k j=1 P (B 1 (n) ≥ j + 1) π n k−j = λ k+1 j=1 P (B 1 (n) ≥ j) π n k+1−j , which completes the induction. With this confirmation of the recursion, let us now observe an alternate representation of the summation within it. That is, for Q C (n) as the delay model in steady-state, one can note through the law of total probability that P B 1 (n) + Q C (n) ≥ i = ∞ j=0 P (B 1 (n) ≥ i − j) π n j = i−1 j=0 P (B 1 (n) ≥ i − j) π n j + P Q C (n) ≥ i , since P (B 1 (n) ≥ i − j) = 1 for all j ≥ i. This then implies that one can re-express the recurrence relation as π n i = λ µ(i ∧ cn) P B 1 (n) + Q C (n) ≥ i − P Q C (n) ≥ i , and we can now use this to give a representation for F n (x) ≡ P (Q C (n) ≤ xn). Since F n (x) = xn i=0 π n i , we have that

By changing the step size of the summation to being in increments of 1 n , this sum becomes F n (x) = π n 0 + xn /n i= 1 n ,

Letting Y be equivalent in distribution to the limiting object of Q C (n) n as n → ∞, we have that F ∞ (x) = P (Y ≤ x) is given by

for all x > 0, since π n 0 → 0 and B 1 (n) n D =⇒ M 1 as n → ∞. Using Lemma 2 and the alternate representation of the integral in Equation 25, one can note that F C (x) = P (ψ C ≤ x) will be given by

From Lemma 2 we have that F C (x) is the unique distribution satisfying this equation and thus Y D = ψ C , completing the proof.

Having now justified the interchange of limits, we will now develop an asymptotic approach to calculate the exceedance probabilities for general batch sizes in Subsection D.1 through use of the results from the storage process literature. It is worth noting that in specific settings the integral equations in Lemma 2 can yield results directly. An example of this is in the case of exponential distributed marks, which arise as the limit of geometrically distributed batches.

To calculate the exceedance probabilities for ψ C , we will again draw upon its relationship with the tractable shot noise process, ψ ∞ . Furthermore, we will also make use of a transform method for computing the cumulative distribution function and truncated expectation of a random variable through use of orthogonal Legendre polynomials. This approach is based on an generalization of Sullivan et al. (1980) , in which the authors provide a representation for the indicator function through a sum of exponential functions. In Section E of the Appendix, we extend this result for use in studying continuous random variables. Through use of the resulting Lemma 3, we derive the following expressions for the exceedance probabilities in Theorem 6.

Theorem 6. In the Markovian case, the threshold exceedance probabilities for ψ C are given by

Then, by observing that P (M 1 + ψ ∞ > x | ψ ∞ ≤ c) ≥ P (M 1 > x) through the independence of the two quantities and the fact that each is positive, we furthermore have

Using the decomposition in Equation 33, this now yields the upper bound

This bound is most helpful in cases of small λ, as in that case ψ ∞ is likely to be small.

To support our general batch analysis we will now introduce a technical lemma that extends Sullivan et al.

(1980) to a probabilistic context. In Sullivan et al. (1980) , the authors use shifted, asymmetric Legendre polynomials to produce a sum of exponential functions of x ≥ 0 that converges to the indicator function 1{x ≤ c} for any constant c > 0. These approximations make use of the generalized hypergeometric function 3 F 2 (·), which is defined

where (c) i = i−1 j=0 (c + j) is a rising factorial. By use of the dominated convergence theorem, in Lemma 3 we generalize this result using a sum of moment generating functions of a continuous non-negative random variable. We find convergence to the cumulative distribution function of the random variable, as well as to the expectation of the product between the random variable and an indicator function. Therefore, this and so we will consider candidate m values, which we plot in Figure 5 .

As one can see, for relatively small values of m the approximation performs quite well, as the simulated values and the approximation are virtually indistinguishable before the true probability is approximately of order 10 −5 . However, if desired we can improve this further by taking the average among the candidate approximations. We can see that this does well in this example, and we can quickly show it will do no worse than the worst individual approximation. For p as the true probability and p σm as the approximation at m, by the triangle inequality we have that

Thus, a loose description of an approximation heuristic based on these Legendre limits is as follows: compute multiple candidate approximations, remove clear errors caused by numerical instabilities and pre-convergence gaps, and take the average of the remaining candidates. While our experiments suggest that this simple approach does well, we can note that it could be possible to develop more sophisticated numerical approximations based on these limits and we find this to be an interesting direction of future research.

For a final numerical experiment, let us also explore dependence within batches of jobs. As an empirical exploration of this, in Figure 6 we plot normalized queue length processes under three different dependency structures. In each setting, there is a probability ρ ∈ [0, 1] that each successive service time will be dependent.

In the first case, (a), each service time in a batch has probability ρ of being equal to the first duration within that batch and otherwise will be drawn independently, i.e. for an arbitrary batch i and j ≥ 2, S i,j = S i,1 with probability ρ, S i,j otherwise, whereS i,j is an independent draw from the service distribution. In case (b) this is instead equal to the previous time with probability ρ and independent otherwise, meaning S i,j = S i,j−1 with probability ρ, S i,j otherwise, and in (c) each time is an average over all previous service times within the batch with probability ρ and again otherwise independently drawn:

otherwise.

In each of these settings, we plot simulated sample paths for ρ ∈ {0, 0.1, 0.5, 0.9, 1} and we hold the arrival epochs fixed across all the experiments. When ρ = 0 all durations are independent regardless of the dependency setting, and thus these processes are effectively identical on this sample path due to the results in Theorem 2. Similarly, if ρ = 1 the service times within each batch are perfectly correlated. Moreover, these processes are equivalently distributed across the dependency settings. In the case of infinitely many servers, the normalized queue length process can be trivially identified as a piecewise constant jump process (and in the case of deterministic batch sizes, this is an infinite server queue). However in the multi-server case the batch arrival queue is not as easily understood, and even more insight is lost in the intermediate settings of (c). This illustrates that batch scaling limits subject to dependency within batches may merit its own future study. It is worth noting though that in the infinite server setting there are immediately available extensions of Theorem 1. For example, for each arriving batch in case (a), there is a binomially distributed number of jobs that are identical in duration, with the remaining jobs independently drawn. Under the batch scaling limit, this means that a ρ fraction of each jump will contribute to a piecewise constant jump process while the remaining 1 − ρ will function as part of a shot noice process. Moreover, because the limits in Theorem 1 make use of the law of large numbers, one could recover the infinite server batch scaling if the service durations are weakly dependent.

Notes 8. The lower end of this interval being open is akin to taking ν < 1, matching the fact that there is a non-degenerate distribution yielded by the pure batch scaling limit.

9. It is straightforward to make a stronger statement for the exogenous burst duration and achieve almost sure convergence to 0 through the Borel-Cantelli lemma, since P (T Exo,η > ) = e −ηβ Exo .

The modern call center: A multi-disciplinary perspective on operations management research

Laws of large numbers for dependent non-identically distributed random variables

A diffusion regime with nondegenerate slowdown

Scheduling parallel servers in the nondegenerate slowdown diffusion regime: Asymptotic optimality results

Control of fork-join networks in heavy traffic

Asymptotically optimal interruptible service policies for scheduling jobs in a diffusion regime with nondegenerate slowdown

The suite life of students quarantined at the USC Hotel

The fork-join queue and related systems with synchronization constraints: Stochastic ordering and computable bounds

Algorithmic methods for multi-server queues with group arrivals and exponential services

Capacity sizing under parameter uncertainty: Safety staffing principles revisited

A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems

Dimensioning large call centers

Stationary distributions for dams with additive input and content-dependent release rate

Storage processes with general release rule and additive inputs

A survey of Monte Carlo tree search methods

Sustainable mobility: a vision of our transport future

Analytically elegant and computationally efficient results in terms of roots for the GI X /M/c queueing system

How evolutionary selection can train more capable self-driving cars

Beating the curse of dimensionality in options pricing and optimal stopping

On dams with additive inputs and a general release rule

Further results for the queueing system M X /M/c

The war to remotely control self-driving cars heats up

On the distributions of infinite server queues with batch arrivals

An ephemerally self-exciting point process

Shot-noise fluid queues and infinite-server systems with batch arrivals

Generalized peakedness of teletraffic processes

Staffing of time-varying queues to achieve timestable performance

Autonomous shuttles help transport covid-19 tests at mayo clinic in florida

Response to a COVID-19 outbreak on a University Campus-Indiana

Telephone call centers: Tutorial, review, and research prospects

A better model for job redundancy: Decoupling server slowdown and job size

Redundancy-d: The power of d choices for redundancy

Designing a call center with impatient customers

Amplitude distribution of shot noise

With full Statler, isolated students trickle into off-campus hotels

Some universities have less space to isolate students this fall. Is that a problem? Chronicle of Higher Education

Coping with time-varying demand when setting staffing requirements for a service system

Staffing call centers with uncertain demand forecasts: A chanceconstrained optimization approach

Heavy-traffic limits for queues with many exponential servers

Beyond safety drivers: Applying air traffic control principles to support the deployment of driverless vehicles

The stationary distribution and first exit probabilities of a storage process with general release rule

The recurrence classification of risk and storage processes

Sharp waiting-time bounds for multiserver jobs

Modeling and forecasting call center arrivals: A literature survey and a case study

Autonomous vehicle market by level of automation (level 3, level 4, and level 5) and component (hardware, software, and service) and application (civil, robo taxi, self-driving bus, ride share, self-driving truck, and ride hail) -Global opportunity analysis and industry forecast

Server staffing to meet time-varying demand

Managing uncertainty in call centres using poisson mixtures

Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?

Storage processes with Markov additive input and output