key: cord-0554039-b3qlupva
authors: Forbes, Michael; Harris, Mitchell; Jansen, Marijn; Schoot, Femke van der; Taimre, Thomas
title: Combining Optimization and Simulation Using Logic-Based Benders Decomposition
date: 2021-07-18
journal: nan
DOI: nan
sha: a53a6f6c3a4db7e3862c7d7aa8915b587b638b56
doc_id: 554039
cord_uid: b3qlupva

Operations research practitioners frequently want to model complicated functions that are difficult to encode in their underlying optimization framework. A common approach in the literature is to solve an approximate model, and to use a simulation to evaluate the true objective value of one or more solutions. We propose a new approach -- Branch-and-Simulate -- for integrating simulation into the optimization model itself. Branch-and-Simulate is a variant of logic-based Benders decomposition where we run a simulation at each incumbent solution to the master problem. The simulation data is then used to guide the trajectory of the optimization model itself, via Benders cuts. We test the approach on a class of stochastic resource allocation problems with monotonic performance measures, and derive strong Benders cuts which are valid for all problems of this form. We consider two concrete examples: an airport check-in counter allocation problem, and a nursing home shift scheduling problem. Whereas previously these problems could only be solved approximately, we solve challenging instances to optimality within a reasonable amount of time. In the appendix, we will also use Branch-and-Simulate to approximately solve an ambulance location problem.

In resource allocation problems, the goal is typically to optimally distribute scarce resources among a finite set of objects, such as time periods, or locations in a network. Although the nature of the resources might vary from problem to problem, we are usually interested in how many resources should be allocated to each object, which makes integer programming a natural solution approach. When the objective contribution of each object is a convex function one variable (that is, the number of resources allocated to that object), then standard techniques might be suitable. We consider more complicated objective functions.

Let Z + and R + denote the sets of non-negative integers and reals respectively. For a positive integer n, [n] = {1, . . . , n} denotes the set of all integers ranging from 1 up to n. In this paper, we will study optimization problems of the form min y∈Y g(y) + j∈ [n] c j f j (y)

where Y ⊆ Z n + is a finite set, g : Z n + → R is function, and for each j ∈ [n], f j : Z n + → R + is a non-increasing function of each of the variables (and c j ∈ R + are non-negative constants). The elements of [n] represent a collection of objects, and for each j ∈ [n], the j th component of y = (y 1 , . . . , y n ) represents the number of resources allocated to the j th object. The objective function consists of a cost function g for the resources, and for each j ∈ [n], a performance measure f j for the j th object. Our task is to find a resource allocation, y, which minimizes the combined cost and performance. The importance of the monotonicity property for each f j will be made clear soon, but first, consider an example.

Example 1. We are given a finite set of jobs, and n adjacent time periods of finite length.

Each job has a release time and a processing time. We can decide how many staff will work in each time period, subject to some constraints, and the goal is to minimize the total delay (start time minus release time) of all of the jobs. The jobs are processed as staff become available, and are scheduled on a first-come-first-serve (FCFS) basis. For all j ∈ [n], let y j be an integer variable representing the number of staff working in time period j, and let Delay j (y) be the total delay of all jobs released in time period j, given the staffing vector y = (y 1 , . . . , y n ). In Section 4 we will prove that Delay j is a non-increasing function of all of the variables. In other words, increasing the number of staff available in any time period cannot increase the delay of any job. ♦ This paper is motivated by several properties of this example. Firstly, suppose that at some point in continuous time, not enough staff are available to start processing all of the jobs which have been released. Then we get a backlog of delayed jobs. So for each j, Delay j (y) depends not only on the number of staff available in time period j, but on other time periods as well. It depends on previous time periods if the staff available in time period j are busy processing the backlog, and on future time periods if the jobs released now manage to join the backlog. Since the exact dependencies are not known a priori, Delay j really is a function of all of the decision variables; not just y j . Be that as it may, in practice, the delay tends to depend strongly on the neighbouring time periods, and weakly or not at all on the more distant ones. We will make this notion more precise later on.

Secondly, Delay j does not have a closed-form analytic expression. While it is natural to try to model the full problem as one large mixed-integer program (MIP), the delay functions inherit a queuing structure from the set of jobs. Incorporating this structure would require many new variables and constraints, and the resulting formulation would be prohibitively difficult to solve. On the other hand, for a fixed choice of the staffing vector y, we can evaluate each Delay j (y) during a discrete-event simulation of the job queue. The usual approach in the literature is to solve an easier problem which approximates the queuing structure, and then to use simulation modelling to evaluate the actual delay associated with one or more solutions. Naturally this comes at the expense of an optimality guarantee. Simulating the performance of every feasible solution is clearly a bad idea, but thanks to the monotonicity property, the output of one simulation provides useful lower bounds on the performance of other solutions. This motivates us to consider a Benders decomposition approach instead.

To solve problems of the form (1), we will exploit the general properties exemplified by Example 1. The performance measures are functions of all the decision variables, but in practice depend on only a few of them. And while they lack a closed-form analytic expression, for fixed solutions they can be evaluated during a simulation. Finally, they are non-increasing (or non-decreasing for maximization problems); in other words, adding more resources cannot make the performance worse. We can accomodate almost arbitrary constraints. We will consider two concrete applications: an Airport Check-in Counter Allocation (ACCA) problem, and a Nursing Home Shift Scheduling (NHSS) problem. For the ACCA problem, the task is to determine how many check-in counters to open at an airport check-in terminal in order to minimize the combined operational costs and service costs, which are measured by queuing time. For NHHS problem, the task is to select and schedule care worker shifts in order to minimize the total delay of all patient requests. Brief domain-specific literature reviews have been deferred to the relevant sections. The approach itself is a variant of logic-based Benders decomposition (LBBD).

Hooker (2000) , Hooker and Ottosson (2003) introduced LBBD, which is an ambitious generalization of Benders (1962) 's classical decomposition strategy for solving MIP problems. Like classical Benders decomposition, the original problem is partitioned into a master problem and a set of subproblems. The master problem is a relaxation of the original problem, and we use new variables to approximate the original objective function. At incumbent solutions to the master problem, the subproblems are solved to generate Benders cuts, which gradually refine the approximation. In our case, the resource allocation decisions will be left in the master problem, and the computation of the performance measures will be relegated to the subproblems. The master problem is usually a MIP, and we "solve" the subproblems using simulations.

To apply classical Benders decomposition, the subproblem must be a linear program (LP), and cuts are derived by solving the LP dual. Generalized Benders decomposition, which was introduced by Geoffrion (1972) , is an extension of the technique to certain nonlinear programming problems, which uses convex duality theory instead. With LBBD on the other hand, the subproblems can take the form of any optimization problem. Although this flexibility has allowed the principles of Benders decomposition to be applied to a much larger set of problems, one weakness is that the structure of the Benders cuts is problem-specific. Indeed, finding suitable Benders cuts sometimes requires considerable effort, because we lack a convenient theory of duality when the subproblem is arbitrary. In the case of (1), our assumptions are weak enough to capture a wide range of practical resource allocation problems, and yet, we will still derive strong Benders cuts which are provably valid for the entire class of problems.

Although we will focus on resource allocation problems, this paper serves as a proof-ofprinciple for a broader methodological framework that is new to the operations research literature. The idea is to simulate the performance of each incumbent solution to the relaxed problem. The simulation data is then used to guide the trajectory of the optimization model. Strong logic-based Benders cuts ensure (i) that we converge to an exact solution to the original problem, and (ii) that the number of solutions we actually need to simulate is relatively small. We will refer to this principle as Branch-and-Simulate throughout. Our computational results show that Branch-and-Simulate significantly improves upon the stateof-the-art for the applications considered. For now, in §1.3, we will briefly discuss stochastic optimization, before reviewing some of the relevant literature.

When modelling real-world phenomena, the exact value of the performance measures frequently depends on information that cannot be known ahead of time. For instance, with Example 1, we might be required to allocate the staff to their shifts before the exact release times, processing times, or even the number of jobs is known. To model this situation math-ematically we assume f j also depends on one-or-more random variables, R. The problem to be solved is then min

where E denotes the expected value operator. This is a stochastic integer programming problem, and trying to calculate the exact expectation is hopeless, because the performance measures lack any convenient analytic structure. A standard approach in this situation is to calculate the average performance over a finite number of samples. We either assume that R can be sampled efficiently, or if the context allows it, that we have access to historical data. Let S be the set of scenarios; that is, an indexing set for the samples. For each s ∈ S, let r s ∼ R denote the s th sample, and define f sj (y) = f j (r s , y). The |S|-sample-average approximation of (2) is then

which is a deterministic optimization problem. Recalling Example 1, we assume f sj is a nonincreasing function for each object and each scenario. To obtain a reliable approximation we would like to use a relatively large number of samples, which makes it even more painful to use a direct approach. Note that (3) no longer depends explicitly on R. Therefore the approach is independent of certain distributional assumptions, which frequently limit the applicability of a model. The only requirement is that we can (at least approximately) sample from the underlying distribution. In order to keep notation as simple as possible, in Sections 2 and 3 we will study the following generalization of both (1) and (3):

where K is a finite non-empty set. In other words, we do not require there to be a bijection between the performance measures and the objects. That said, it will be useful to assume that each performance measure is associated with a specific object, which we express using a function π : K → [n]. In other words, π(k) is the object associated with the k th performance measure. For instance, with Example 1, the object associated with Delay t is t itself.

Classical Benders decomposition was introduced by Benders (1962) , and we refer the reader to Rahmaniani et al. (2017) for an extensive survey of applications. Hooker (2000) , Hooker and Ottosson (2003) observed that the concept underlying Benders decomposition could be applied in a much more general setting. Since then, LBBD has been applied with remarkable success to a diverse range of problems. We refer the reader to Hooker (2019) for a survey of applications. Classical Benders decomposition has been applied to two-stage stochastic programming problems with linear recourse by van Slyke and Wets (1969) , and with integer recourse by Laporte and Louveaux (1993) . In the case of integer recourse, Benders cuts are derived by solving the dual of the LP relaxation. To ensure finite convergence, they are supplemented with combinatorial cuts that eliminate previous master solutions. This approach suffers from extremely slow convergence if the subproblem has a weak linear relaxation, because combinatorial cuts may be needed at many or most master solutions.

LBBD, on the other hand, does not rely on LP duality to derive Benders cuts. Elci and Hooker (2020) applied LBBD to two-stage stochastic planning and scheduling problems, where the master problem is an assignment problem that is solved using mixed-integer programming, and the subproblems are scheduling problems, where constraint programming excels. Benders cuts are derived by solving the so-called "inference dual" of the subproblems. We refer the reader to Hooker and Ottosson (2003) for an explanation of the inference dual, since we will not need it. Lombardi et al. (2010) use LBBD to solve a stochastic allocation and scheduling problem for conditional task graphs in multi-processor embedded systems. The master problem assigns tasks to chips, and tasks are scheduled in the subproblems. The subproblems communicate with the master problem via so-called "no-good cuts," which we will explain in Section 2. Fazel-Zarandi et al. (2012) study a stochastic facility location and vehicle assignment problem, where the subproblems communicate with the master problem via feasibility cuts. Guo et al. (2021) use LBBD to solve a stochastic extension of the distributed operating room scheduling problem, which aims to find an assignment of surgeries to operating theatres. In that problem, the subproblems are always feasible, and logic-based Benders optimality cuts are used instead. We will explain the difference between feasibility and optimality cuts in Section 2. This literature review is continued in Sections 5 and 6 where we focus on our considered applications.

In Section 2 we will give an outline of LBBD, formally decompose (4), and discuss some implementation details. In Section 3 we will derive a sequence of Benders cuts, of increasing strength, which are valid for all problems of the form (4). We will also derive two families of valid inqeualities, or initial cuts, which can be added to the master problem. The rest of the paper will focus on applications. In Section 4 we will expand on Example 1 in greater detail. We will prove that the problem class has the required structure. In Section 5 we introduce our first concrete application; an ACCA problem. In §5.2 we present the results of some computational experiments, with an emphasis on comparing the relative strength of the Benders cuts. Instances derived from the literature are solved with 100 scenarios in a reasonable amount of time. This is the first approach to solving instances of this kind which is both scenario-based, and exact (at the level of the scenarios). In Section 6 we will introduce the NHSS problem, with computational results in §6.2. The story is similar here. We conclude with a discussion of the resuts, and some avenues for further research. In Appendix C we will consider an Ambulance Location problem (ALP). There the monotonicity condition is only approximately satisfied. We can nevertheless obtain a strong approximation using Branch-and-Simulate.

Branch-and-Simulate is a variant of logic-based Benders decomposition. In this section we will outline LBBD in general. Consider an optimization problem of the following form, which we call the original problem (OP):

where X and Y are sets, f : X × Y → R is a function, and C(x, y) is a list of constraints which link the x and y variables. Any optimization problem can be written in this form by arbitrarily partitioning the variables, but in practice, the x variables are whichever ones we want to omit from the master problem. Since optimization problems typically become exponentially more difficult to solve as the number of variables and constraints increases, it is likely that OP becomes much easier when either the x or y variables are fixed. In fact, we could solve OP to optimality by solving this "subproblem" in x for all fixed y ∈ Y and choosing the best objective value. This is clearly impossible if Y is an infinite set, and extremely inefficient otherwise. Instead, Benders' approach is to replace f (x, y) and C(x, y) with a new continuous variable θ, which is intended to approximate the optimal objective value. The new problem is referred to as the master problem (MP), and it has the following form: min θ, y θ subject to y ∈ Y, θ ∈ R.

(MP)

If the original objective function is separable, we can use different variables to approximate each component. Any part of the objective which we do not want to approximate can remain in the master problem. These are straightforward extensions. The optimal objective value for MP is a lower bound on that of OP. Currently this is obvious since MP is unbounded, but we will ensure this property is preserved whenever the master problem is modified. Now, for each fixed y ∈ Y we have a subproblem, denoted SP y , which contains the information omitted from the master problem.

The idea behind Benders decomposition is to gradually refine the master approximation by adding Benders cuts-new constraints-which are derived from solutions to the subproblems. Indeed, letθ,ȳ be an optimal solution to MP after adding any Benders cuts generated so far. After solving SPȳ, there are two types of Benders cuts we might need.

Since the x and y variables are linked by the constraints, SPȳ may be infeasible. In that case, even thoughȳ is feasible for MP, if (x, y) is feasible for OP then y =ȳ. In other words, the feasible region for the master problem is too large. To correct this, we add a feasibility cut to the master problem. A feasibility cut onȳ is an inequality of the form Fȳ(y) 0, where Fȳ : Y → R is a function with the following properties:

After adding a feasibility cut, we can re-solve the master problem, and repeat until we get a feasible solution. In the worst-case scenario, we can use a no-good cut, which is equivalent to the statement "y =ȳ." In other words, we "cut off" the current, "no good" solution, but only that solution. Ideally we would like to eliminate as many infeasible solutions at a time as possible. How we do this (and if we can do this) depends strongly on the structure of the problem. Now suppose SPȳ is feasible. If SPȳ is unbounded, then so is OP, and we can stop. Otherwise, letx be an optimal solution to SPȳ. It could be the case thatθ < f (x,ȳ), which means the current master constraints are not tight enough. We want to add an optimality cut to the master problem. An optimality cut on θ andȳ is an inequality of the form Bȳ(y) θ, where Bȳ : Y → R is a function with the following properties:

In other words, the cut is exact at the current solution, and an underestimate elsewhere. By exactness, ifȳ is ever optimal for MP again, this time θ will correctly estimate the objective value, and we can stop. The strength of an optimality cut, however, is determined by how tight it is at other feasible solutions. We get a trivial Benders cut (what we might call a no-good optimality cut) if we define Bȳ(ȳ) = f (x,ȳ), and Bȳ(y) = −∞ for all y ∈ Y \ {ȳ}, but this reduces Benders decomposition to an exhaustive search. Since optimality cuts are underestimates, the optimal master objective value is always a lower bound on that of the original problem. The following Lemma is an immediate consequence.

Lemma 2. Letθ andȳ be an optimal solution to MP with all additional Benders cuts, and letx be an optimal solution to SPȳ, if one exists. If f (x,ȳ) =θ, then (x,ȳ) is an optimal solution to the original problem.

A generic LBBD procedure is specified in Algorithm 1. The challenge of LBBD, however, is in deriving strong feasibility and/or optimality cuts. This is the role that "logic," or more precicely, logical inference, plays in logic-based Benders decomposition. In each iteration, we want to use the current subproblem solution to infer valid bounds at other solutions. In the classical case this is accomplished using the LP dual. In our case, we accomplish this by exploiting the monotonicity property of the performance measures. Note that there is no general guarantee that Algorithm 1 terminates. However, if Y is finite, then Algorithm 1 is guaranteed to terminate, since there are at most finitely many subproblems to enumerate. As we will see, however, the rate of convergence depends strongly on the strength of the Benders cuts.

To conclude this section, we will formally decompose (4), and discuss some crucial implementation details. Recall that the problem to be solved is

where K is a finite non-empty set. For each k ∈ K we introduce a new non-negative continuous variable θ k intended to approximate f k (y) in the objective function. We will leave Find an optimality cut, Bȳ(y) θ, and update MP all of the decision variables in the master problem. This is atypical of Benders decomposition, but not formally an issue. The master problem is min θ, y g(y) + k∈K c k θ k , subject to y ∈ Y, θ ∈ R K + .

If g is a linear function, and Y = Z n + ∩ P for some polyhedron P ⊂ R n , then the master problem is a MIP. In that case, instead of solving (5) from scratch in each iteration, we can add Benders cuts at nodes of a single branch-and-bound tree. That said, while all of the examples we will consider are MIPs, the solution strategy we propose is independent of how we solve the master problem; we just assume that we can. Note that finding a feasible allocation y ∈ Y may be non-trivial in its own right.

The subproblem involves evaluating the performance measures for the current master solution. While formally a minimization problem, it does not need to be treated like an optimization problem at all, because it has no constraints or variables. We can "solve" the subproblem by computing f k (y) for each k ∈ K using the current master solution. In the case of Example 1, we can run a discrete-event simulation of the job queue using the current solution. Ifθ andȳ are master optimal, andθ k = f k (ȳ) for each k ∈ K, then we can terminate with the optimal solution. This is an easy corollary of Lemma 2.

We will not need feasibility cuts as long as we can solve the master problem. However,

1: Compute f k (ȳ) for all k ∈ K 2: for k ∈ K do 3:

ifθ k < f k (ȳ) then

Generate a Benders cut, θ k Bȳ(y)

Add the cut to the model as a lazy constraint 6: if at least one lazy constraint was added then 7:

θ ← (f 1 (ȳ), . . . , f K (ȳ)) 8:

Pass θ ,ȳ to the solver if finding a feasible solution is difficult, it may be beneficial to solve a relaxation of the master problem and use feasibility cuts. This may even be essential if Y is defined in terms of qualitative constraints on the performance, which cannot be expressed using elementary functions of the y variables. Now suppose the master problem is a MIP which can be solved using branch-and-bound (B&B). Rebuilding a B&B tree from scratch in each iteration is often cumbersome and wasteful. In practice, we will add Benders cuts as lazy constraints during a callback routine. In this context, it is easy to see why stronger cuts yield a faster algorithm. Consider the B&B tree of MP. A priori, the best known lower bound on each solution is trivial. Therefore, with no-good cuts, we must build the entire tree in order to prove an optimal solution. But, if our cuts impose non-trivial bounds at other solutions as well, then we already have tighter lower bounds in other nodes. Therefore, we might be able to fathom some nodes without explicitly enumerating all of their child nodes. The more solutions we can infer valid bounds for at a time, and the tighter those bounds are, the faster we can prune the B&B tree, and prove optimality for MP.

We begin by initializing MP in our favourite solver and introducing any additional valid inequalities. Then the the callback routine described in Algorithm 2 should be applied at each incumbent integer solution. If θ k are optimal for MP in its current form, and no lazy constraints are added during the callback, then each θ k correctly estimates the k th performance measure for the incumbent solution, in which case we can stop. On the other hand, if at least one cut is added, then the incumbent solution will be cut off by the lazy constraints. Since we know the true objective value of the incumbent solution, we can pass it to the solver in a primal heuristic. In the next section we will derive optimality cuts that are valid for (4).

Having formally decomposed (4) in Section 2, it remains to derive valid optimality cuts. In this section,θ andȳ will denote the incumbent master solution, and k is a fixed performance measure. We will derive a sequence of cuts of increasing strength, starting with the no-good cut, which, for the reasons discussed above, is inadequate. To accomplish all this, we need to convert the resource levels into binary variables. To that end, let m and M be upper and lower bounds for the number of resources which can be allocated to any object, and let N = {m, . . . , M } be the set of levels. For all j ∈ [n] and ξ ∈ N , define a function z ξj : Z n + → {0, 1} with z ξj (y) = 1 if and only if y k = ξ, and z ξj (y) = 0 otherwise. In practice we can introduce binary variables z ξj , and add the following linear constraints to the master problem:

These ensure that z ξj = 1 if and only if precisely ξ resources are allocated to the j th object. The no-good optimality cut is

To see that (6) is a valid optimality cut, note that the expression in parentheses is equal to 1 if and only if y =ȳ. Otherwise the expression is non-positive, and the inequality is trivial. The no-good cut corrects our estimate at the current solution, but only that solution.

Using (6), Algorithm 2 degenerates to an exhaustive search. Be that as it may, (6) contains the same idea which underlies the stronger cuts. The bound we want to enforce on θ k , namely f k (ȳ), is computed in the subproblem. We then multiply this bound by a logical expression-an on/off switch, so to speak-which deactivates the cut at any solution where it is not known to be valid. A priori, the only solution we know f k (ȳ) to be a valid bound for isȳ itself, which is reflected in the no-good cut. We can use the monotonicity of f k to prove that f k (ȳ) is a valid bound at other solutions as well. In particular, since f k is non-increasing, if we decrease the number of resources allocated to any object, we cannot possibly reduce the value of f k . This proves that

is also a valid optimality cut. We simply removed the indicator variables which correspond to decreasing the resource levels, since the subproblem bound is still a valid underestimate for those solutions. While (7) is stronger than (6), we may be able to do even better if we are willing to simulate a few more solutions.

Recall that while f k is, a priori, a function of all of the decision variables, in practice f k might depend strongly on a proper subset of the objects. In the case of Example 1 for instance, if the job data is not too pathological, then the delays are unlikely to depend at all on distant time periods. This also depends on the current solution; the fewer staff we allocate to nearby time periods, the wider the window of dependence could be be. We can show that a given window is wide enough by making the worst-case assumption outside that window, and manually verifying that the bound is still valid.

To exploit this in a generic manner, we need some more notation. Given y ∈ Y and any subset L ⊆ [n], we let ∆(L, y) ∈ Z n + denote the integral vector which we get by increasing y j to M for all j ∈ [n] \ L. For example, if n = 5 then ∆({3, 4}, (4, 3, 3, 2, 4)) = (M, M, 3, 2, M ).

In other words, we specify the number of resources allocated to objects in L, and assume the maximum elsewhere. Since f k is non-increasing, we have f k (∆(L, y)) f k (y) for all y ∈ Y and all L ⊆ [n]. If equality is obtained, however, then we have shown that f k does not depend on the number of resources allocated to objects in [n] \ L, as long as we do not increase the numbers in L. In other words, if f k (∆(L,ȳ)) = f k (ȳ), then

is a valid optimality cut. This cut is stronger than the previous one if L is a proper subset of [n], and it is much stronger if L is small. This introduces a trade-off between the benefit of finding a small set L which obtains equality, and the computational expense of doing so.

There is no way to derive a minimal set L analytically, so we will find one using local search. In practice, this means simulating the peformance of a number of intermediate solutions.

Example 3. We illustrate the difference between (6), (7), and (8). Let n = 5, m = 1, M = 5, with no additional constraints. Then there are 5 5 = 3125 feasible integer solutions. Letȳ = (4, 3, 3, 2, 4) be the incumbent. (7) is non-trivial at 4 · 3 · 3 · 2 · 4 = 288 solutions.

On the other hand, if L = {3, 4} and f k (ȳ) = f k (∆({3, 4},ȳ)), then (8) is non-trivial at 5 · 5 · 3 · 2 · 5 = 750 different solutions, and tight for at least 5 3 = 125 of them. See Figure 

Cuts (6), (7), and (8) multiply the desired bound by a logical expression that deactivates the cut if we cannot be sure it is valid at the current solution. To strengthen (8), we would like to make the coefficients of the binary variables as small as possible. Fortunately, if we run a few more simulations, we can bound the error in the f k (ȳ) estimate when the number of resources are increased. Let e j = (0, . . . , 1, . . . , 0) ∈ R n be the j th standard basis vector with a 1 in the j th position, and 0 elsewhere. For ξ ∈ N we define an expression, I k (ξ), which is the performance of k if we allocate ξ resources to π(k), and the maximum number of resources elsewhere. Formally speaking,

Recall that π(k) is an object naturally associated with the k th peformance measure. Our assumption is that f k depends very strongly on the number of resources allocated to π(k).

Since f k is non-increasing, if ξ ȳ π(k) , then I k (ξ) is a valid lower bound on θ k if we increase the number of resources allocated to π(k) from ξ toȳ π(k) , no matter what we do elsewhere. Now define Base k (ȳ) = f k (∆({π(k)},ȳ π(k) e π(k) )).

Again by monotonicity, Base k (ȳ) is a valid lower bound on θ k if we do not increase the number of resources allocated to π(k), no matter what we do elsewhere. This proves that

is a valid Benders cut. Instead of deactivating the cut, we compensate the bound by an appropriate amount when the number of resources allocated to objects in L are increased.

The new bounds might be weaker than f k (ȳ), but if they are not trivial, they may still save us explicitly enumerating some solutions. We are not prevented from increasing the resources both at π(k) and and other elements of L at the same time, but then we suffer both penalties. If we increase the number of allocated resources for too many of the objects at once, the bound may become effectively trivial. Nevertheless, this cut is at least as strong as the previous three.

Now we will consider what happens when we specify number of resources allocated to one or two objects at a time, and make the most optimistic assumption possible elsewhere. In the language of the LBBD literature, we will add a relaxation of the subproblem to the master problem. First of all, we know that

is valid for MP. This is because I k (ξ) is a valid bound on f k whenever ξ resources are allocated to π(k), and precisely one term on the right-hand side is non-zero. This bound can possibly be tightened if we consider a second object. Let l ∈ [n] \ {π(k)} be one such object. For ξ 1 ∈ N and ξ 2 ∈ N , let

denote the performance of k whenever ξ 1 resources are allocated to π(k), and ξ 2 resources are allocated to l. Then

is valid for the master problem. Computational experience suggests that this cut is weak, which makes sense, since the cut is only active if the correct number of resources are allocated to both objects at once. As an intermediate step towards strengthening it, fix ξ 1 , and suppose we tried to impose the bound of W k [π(k),l] (ξ 1 , ξ 2 ) on θ k whenever ξ 2 resources are allocated to l, regardless of how many are allocated to π(k). The necessary inequality would be

which is stronger, but no longer valid in general. However, since f k is non-increasing, it is valid if y π(k) ξ 1 . Suppose, on the other hand, that y π(k) > ξ 1 . Then regardless of the value of ξ 2 , the error in our estimate is no greater than

It follows that (13) is valid for the master problem for all ξ 1 ∈ N . An analagous argument, beginning instead with "suppose we tried to impose the bound of W k [π(k),l] (ξ 1 , ξ 2 ) on θ k whenever ξ 1 resources are allocated to π(k), regardless of how many are allocated to l," leads us to a second set of cuts:

for all ξ 2 ∈ N . The one-dimensional cuts described earlier are the special cases which we obtain when either ξ 1 = M or ξ 2 = M . When MP is a MIP, the initial cuts can significantly strengthen the initial LP relaxation. A pair of cuts of the form (13) and (14) are formally valid for MP for all k ∈ K and l ∈ [n] \ π(k). However, generating all possible initial cuts can become computationally intensive, since in our case, we need to run a discret-event simulation for each evaluation of W k [π(k),l] . Recall, however, that in practice, f k is likely to depend most strongly on the number of resources allocated to objects which are, in some sense, "neighbours" of π(k). For instance, in Example 1, the total delay for all jobs that arrive in a given time period depends strongly on the adjacent time periods. Therefore we will restrict the initial cuts to pairs which are "nearest neighbours" in some natural sense. The notion of neighborhoods will depend on the structure of the problem, but for the cases we consider, one is readily apparent. Now, having laid the theoretical groundwork, the rest of this paper will focus on applications.

Example 1 introduced specified a class of scheduling problems with performance measures that reflect an underlying delay structure, which is typical of queuing problems. In this section we will describe this class of problems in greater detail. We are given a finite set T = {1, . . . , |T |} of time slices, and a finite set J of jobs. The time periods have length L, and by "time period t" we mean the half-open interval [(t − 1)L, tL) ⊂ R. Each job has a release date and a processing time. We get to decide how many agents are allocated to each time period. The jobs are processed by available agents as they are released, and are scheduled on a FCFS basis. We assume an agent will always finish processing a job that starts, even if the number of agents is due to be reduced.

To formulate an optimization problem, we introduce the following variables.

1. For all t ∈ T , y t is the number of agents allocated to time period t, and 2. For all j ∈ J, σ j is the start time of job j.

Note that while j was used to index the objects in previous sections, we are now using t to index the objects-which are time periods-and j to index the jobs. For each j ∈ J let r j ∈ [0, T L) and ρ j ∈ Q + be the release time and processing time respectively of the j th job. Because the scheduling rule is fixed, the start times are completely determined by the agent levels. Hence there exists a mapping,

which takes a vector y ∈ Z T + of agent levels, to the corresponding vector, σ(y) ∈ R J , of start times. We can assume that the jobs are sorted by release date. For a fixed y ∈ Z T + , we can compute σ(y) recursively as follows. For j ∈ J and t ∈ [0, ∞), let N (t, j, y) = |{j ∈ J : j < j, σ j (y) + ρ j > t}| denote the number of jobs before j which are still being processed at time t. Suppose σ 1 , . . . , σ j−1 are known. Then σ j is the optimal solution to the following optimization problem:

Constraint (17) enforces the FCFS rule, and constraint (18) ensures that job j does not start until at least one agent is available. Note that this formalism is only required for the proof of the following lemma. In practice we will not need the σ variables. Indeed, this is largely the point, since incorporating N (t, j, y) into an integer program would be challenging. When using Branch-and-Simulate, the start times and delays of the jobs are calculated during the simulations, and their contribution to the full problem is accounted for by the Benders cuts.

Lemma 4 establishes that the start times are non-increasing functions. In other words, adding agents to any time period cannot increase the start time of any job. The theorem which follows states that the total delay performance measures are non-increasing, which will allow us to apply the theoretical development from Section 3. The theorem is an immediate consequence of the lemma, together with the fact that the composition of two non-increasing functions is also non-increasing.

Lemma 4. The components, σ j (y), of (15), are non-increasing functions of y.

Proof. Proof. It is obvious that σ 1 (y) is non-increasing. Fix j ∈ J and suppose σ 1 (y), . . . , σ j−1 (y) are non-increasing. Since p j are constants, σ j (y)+p j is non increasing for j ∈ {1, . . . , j −1}, which means N (t, j, y) cannot increase with y. Moreover, since r j is a constant, max{σ j−1 (y), r j } is non-increasing. It follows that σ j (y) is non-increasing, which completes the proof. Now suppose the release dates and processing times are random variables. As explained in Section 1, we will compute a sample-average approximation of the expected performance over some set S of scenarios. For each s ∈ S let J s be the set of jobs in scenario s. For each j ∈ J s let r sj and ρ sj denote the release date and processing time respectively, of job j in scenario s. We can apply Lemma 4 separately to each scenario. We get performance measures by aggregating the jobs according to the time period they were released in.

Theorem 5. For all s ∈ S and t ∈ T , the total delay,

of all jobs released in time period t in scenario s, is non-increasing.

Many other classical, scheduling-style performance measures-such as total completion time, or total tardiness-also satisfy the monotonicity condition. Indeed, any non-increasing function of the start times (that we can compute) provides a suitable objective function for our problem. We will focus on the total delay in what follows. The full problem is min y∈Y g(y) + s∈S t∈T c t Delay st (y)/|S| where g : Y → R is a cost function for the agents, c t are non-negative real constants, and Y is the feasible region. In practice, we might have auxillary variables and constraints which refine the feasible region. For instance, in Section 6, y will be required to represent a valid shift schedule.

For the most part, we are not interested in the precise structure of the master problem. Formulating a direct integer program which (i) incorporates the stochasticity of the job arrivals, (ii) enforces FCFS queue discipline, and (iii) models the exact delays of the jobs in each scenario, would be challenging. But since Delay st are non-increasing functions, the Benders cuts and initial cuts described in Section 3 are valid for problems of this form. The MP is

where θ st is a new continuous variable intended to estimate Delay st (y). We will add initial cuts using all of the pairs (s, t − 1), (s, t) of adjacent time periods, for each scenario. The set L which features in the Benders cuts will be derived using Algorithm 3, where the idea is to expand (and contract) the set {t} greedily until the necessary equality is obtained.

The total delay of all jobs released in a specific time period is not trivial to compute. However, for each scenario, we can evaluate the delay of all the jobs during a discrete-event simulation of that scenario. The computational details of simulating multiserver queues are well known, and Python code is available on the GitHub. The boundary condition at the end of the time horizon can be handled in a variety of ways. In Section 5 we will impose a large penalty for each job that reaches the end of the time horizon without being processed. In Section 6, based on practical considerations, we will introduce a sufficiently long, (|T | + 1) st time period, with a small number of staff available. In both cases the monotonicity property of the performance measures is clearly preserved. Note that over the course of the algorithm, we may need the value of certain performance measures at a given solution many times more than once. Therefore it is essential to store the values computed during any simulation for later use.

In this section we will focus on an ACCA problem. This application is motivated by increasing passenger air traffic witnessed over the past few decades. Despite the recent disruption caused by the COVID-19 pandemic, this growth is projected to continue. Modern airports tend to service a large number of distinct airlines, flights, and passengers. Two major chokepoints on passenger flow within an airport are security checkpoints, and checkin terminals. And although self-service check-in terminals are becomming more common, traditional staffed checkpoints are likely to remain necessary for several reasons, including security concerns, baggage handling, and personal preferences. In any case, the model we develop is applicable to a wide variety of multiserver queuing problems. Two major classes of check-in systems are employed in airports: common use systems are comprised of a set of counters which accept any passenger, regardless of their flight, while exclusive use systems are dedicated to individual flights. We focus on common use check-in systems, since they are more common in practice, and more complex from a modelling perspective.

An early integer programming model was proposed by van Dijk and van der Sluis (2006), who minimize the number of check-in counters required to service all flights. Simulation modelling is then used to evaluate the performance of the solutions produced by the optimization model. We will test our model on a numerical example derived from this paper, although the precise optimization problem is distinct. Bruno and Genovese (2010) proposed several inte-ger programming models for determining the number of check-in counters to open. Araujo and Repolho (2015) extend their work by introducing a service level constraint, which caps the proportion of passengers still in the queue at the end of each time period. They also use simulation modelling to evaluate solutions to the approximate model. Bruno et al. (2019) extend the model further by incorporating a staff scheduling component. Lalita et al. (2020) propose an integer programming model that attempts to enforce FCFS queue discipline, by ensuring that sufficient check-in capacity is available.

These models account for the stochasticity of the problem by including estimates of the passenger arrival rates and service times as model parameters, so they fail to account for any variance in those parameters. We can think of this as a "single scenario" approach. Furthermore, they embed only an approximation of the queuing structure into the model. With Branch-and-Simulate on the other hand, the simulation data for each scenario is available during optimization, so no approximation of the queuing structure is required. FCFS queue discipline is maintained inherently, and the exact queuing time of each passenger in each scenario is known. For a single scenario, we can guarentee a globally optimal solution to the precise queuing model. And, as we will see in §5.2, we can solve realistic instances to optimality with 100 scenarios, yielding a stable and accurate solution to the stochastic variant of the problem.

Another advantage of Branch-and-Simulate is the fact that the approach does not depend on the structure of the feasible region. So practitioners can incorporate additional constraints as their context demands. Indeed, as we will see in Section 6, incorporating a shift scheduling structure is straightforward. Since the simulation itself is a black box from the point of view of the solver, there also can we introduce as much additional complexity as desired, as long as monotonicity is preserved. Finally, we could optimize over historical data. That is, we could, in principle, optimize over previous daily arrival patterns for the relevant airport, instead of using synthetically generated scenarios, or estimates of the parameter averages. With all this in mind, Branch-and-Simulate represents a significant advance on the state-of-the-art for this area.

There are many variants of the ACCA problem. We consider a fairly general one. We are given a set T of time periods, a set F of flights, and a set J of passengers. Each time period has length L, and each passenger is scheduled to board one of the flights. The passengers arrive over the course of the time horizon, and must pass through the check-in terminal. Passengers are served on a FCFS basis at open check-in counters, and we get to decide how many check-in counters to open in each time period. We can open up to M check-in counters at a time. We incur a fixed cost of D for opening a check-in counter for one time period, as well as a cost of Q per time period for the total delay (that is, queuing time). We will incur a penalty of |T | × L (an arbitrary large value) for each passenger that we fail to serve by the end of the time horizon. For each flight f ∈ F we have a deadline d f , which is the latest time we are allowed to finish servicing any passenger for that flight. Since the arrival times and service times are stochastic, deadlines will need to be enforced at the level of the scenarios. We assume that we finish serving any passenger we begin serving, even if a counter is due to close.

It is not hard to see that the ACCA problem is a special case of the scheduling problem introduced in Section 4. The passengers are the jobs, the time periods are the objects, the agents are staffed check-in desks, and the queuing time of passengers corresponds to the delay of the jobs. The passenger arrivals and service times are assumed to be stochastic, so we will compute a sample-average approximation of the expected cost using some set S of scenarios. To ensure that the passenger deadlines are met in every scenario, we will introduce two families of constraints. For each f ∈ F let P f ⊂ P denote the set of passengers for that flight. For each t ∈ T define

Then B t is the maximum total service time of passengers for flights which depart during or before time period t, while A t is the maximum total service time of all passengers who arrive during or after time period t. Service level constraints of the form

Ly t B t for t ∈ T ensure that sufficient check-in capacity is made available to clear all passengers in every scenario, as long as M is sufficiently large. The complete master problem is specified in Appendix A.

In this section we will present the results of some computational experiments. Implementations were programmed in Python 3.8.8 via the Anaconda distribution (4.9.2). MPs were solved using Gurobi 9.1.2, Gurobi Optimization, LLC (2021), and discrete-event simulations were programmed from scratch. Jobs were run in parallel on a computing cluster operating 2.5GHz CPUs. For benchmarking purposes, each job was allocated up to 5GB of memory and a single thread, although we were usually able to achieve faster solution times using a typical desktop PC. To generate instances for the ACCA problem, we used the numerical example from van Dijk and van der Sluis (2006). We have 21 time periods, each of which is L = 30 minutes long. There are 10 flights, and 2160 pasengers. The number of passengers for each flight is given in (21). For each flight we have an "arrival period" which consists of seven sequential time periods. The first time of each arrival period is given in the third row of (21). All passengers for a flight arrive in the corresponding arrival period. The distribution of passenger arrivals over an arrival period is given in (22). The deadline for each flight is the end of its arrival period (note that no passengers arrive in the final period). To generate a set of scenarios, for each s ∈ S and each j ∈ P , the arrival time of passenger j in scenario s is given by

where U denotes the discrete uniform distribution, f j is the flight for passenger j, Start f j is the starting time period for flight f j , and W is the discrete random variable with the distribution given by (22). Service times were sampled from an exponential distribution with mean 2. In the objective function, we incur a fixed cost of D = 40 for opening a checkin counter for one time period, and in each time period we can open up to M = 20 check-in counters. We consider a range of queue costs, with Q = 40 being typical in the literature. First we will compare the relative quality of the Benders cuts described in Section 3 with respect to runtime, and the optimality gap obtained. To that end, we generated 25 sets of |S| = 100 scenarios in the manner described above. We considered queuing costs Q ∈ {5, 10, 15, . . . , 40}, for a total of 200 distinct instances. These instances grow in difficulty as the queuing cost decreases.

We attempted to solve each instance with the four Benders cuts described in Section 3, with and without initial cuts, constituting eight approaches. We label the cuts by NG, M, L, and S (for the no good cut, and cuts (7), (8), and (9) respecitvely), and +In indicates that initial cuts were used as well. Each job was allowed 3600 seconds of computational time. NG+In, NG, and M were unable to reach the timelimit without exceeding the memory capacity in any instance. Performance profiles for the five remaining methods are provided in Figure 2 . Result files for individual jobs are availble in the accompanying GitHub repository.

The performance profiles show that L+In and S+In we were able to solve the majority of instances to optimality within 1000 seconds, and achieve a small optimality gap for the rest. Initial cuts always took at least 209 seconds to generate. Although L+In and S+In are comparable, the relative strength of cut S is made clear by the disparity between methods S and L. Without initial cuts, even L exceeded the memory capacity on all but four of the instances, while S still solved close to half of the instances to optimality within the allowed time. This demonstrates the strength of both S and the initial cuts.

Naturally, we are also interested in the accuracy of the sample-average approximation itself. Figure 3 tabulates the mean objective values for the eight queuing costs over the 25 sets of scenarios, with their standard deviations. Standard deviations range from 0.232% to 0.3% of the respective means, which indicates that our sample-averages closely approximate the true expected values for each instance. We have also noted the average total runtimes (using S+In) for each queuing cost, to the nearest second. All instances with a queuing cost of at least 10 were solved to optimality well within the allowed time. Figure 4 shows that not only are the optimal objective values stable with respect to the sample-average approximation, the optimal solutions are as well. In Figure 4 , the horizontal axes index the y variables, and the thicknesses of the curves indicate the differences between the maximum and minimum optimal values of the corresponding variable, for four queuing costs, over the 25 sets of scenarios. The figures demonstrate that 100 scenarios is, indeed, sufficiently many to obtain a stable solution to the stochastic problem. We are also interested in the number of simulations run, as a proportion of the feasible solutions. The service level constraints make it challenging to compute the precise number of feasible solutions to a given instance. For the sake of argument, consider the first (ID 0) set of scenarios used in the computations above. First we optimized MP with the new objective function, t∈T y t . In doing so we found that to satisfy the service level constraints, we must open at least 153 check in counters over the 21 time periods. We then added a constraint of the form t∈T y t = 153, and re-optimized the model with the objective, min max t∈T y t . In doing so we found that y = (1, 0, 0, 0, 9, 9, 9, 9, 9, 9, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9) is a feasible solution to this instance. Sinceȳ is feasible, every solution which is componentwise larger thanȳ is also feasible. Therefore t∈T (21 −ȳ t ) ≈ 3.727 × 10 23 is a lower bound on the total number of feasible solutions for this instance. It is a lower bound because we have not accounted for feasible solutions which we can obtain by increasing one level and decreasing another. In any case, among the 200 instances solved with S+In detailed above, the maximum number of simulations run was 603948, which is a tiny proportion of the feasible solutions. This illustrates the soundness of Branch-and-Simulate as a methodology.

In this section we will focus on the NHSS problem. This application is motivated by the rapidly increasing number of elderly people. Even though elderly people have, on average, better overall health than before, many still face disabilities or other chronic conditions that come with age. In view of this, the demand for long-term care is projected to grow substantially in the next few decades.

Health care is a prominent application area in operations research, but researchers have mostly addressed optimization problems for hospitals, emergency services, and home care services. Studies focusing on nursing homes are scarce. The studies Lieder et al. (2015) , Moeke et al. (2014) , ), van Eeden et al. (2016 ), and Bekker et al. (2019a are the first and so far the only ones to address capacity planning and shift-scheduling problems for nursing homes. The challenge in this context is to meet residents' requests for care or assistance within a reasonable time frame. Requests are partly scheduled in advance (such as taking medicines) and partly unscheduled (such as going to the bathroom).

Perhaps the most ambitious study to date is Bekker et al. (2019b) . Based on real-life data of both scheduled and unscheduled care in a nursing home, the authors formulate an optimization model to determine the optimal shift scheduling for one day. To deal with the complexity and size of the problem, they approximate the optimization problem using a Lindley recursion. The resulting MIP is then solved using standard techniques. Although the results provide important insights into the optimal staffing pattern for a nursing home during one day, the approach also has several drawbacks. Besides the fact that only an approximation of the problem is solved, the approach does not scale well; analyzing larger nursing homes or extending the optimization problem to several days seems unfeasible.

Branch-and-Simulate enjoys a number of advantages in this context, and they mirror those already discussed in the context of the ACCA problem. The simulation data informs the optimization model, so no approximation of the patient request structure is required. At the level of the scenarios, we are able to compute an exact solution. As we will see in §6.2, this can be done even with a large number of scenarios. Therefore, we can be confident that the solution closely approximates the true optimal solution to the stochastic problem. Moreover, the approach is scalable; we can still solve the problem in a reasonable amount of time after adding additional requests, or extending the time horizon. Like the ACCA problem, we can optimize over historical request patterns for a specific nursing home.

We are given a set T of time periods, and a set Λ of shift lengths in hours. Each time period has length L in minutes. Shifts are allowed to start every L minutes, beginning each day from time 0. In light of practical considerations, we assume L is a divisor of 60. We let Γ denote the set of possible shifts, which can be generated ahead of time. If Λ = {4, 8} and L = 60, for instance, then we allow all shifts of 4 or 8 hours, starting every hour from time 0. For each t ∈ T and γ ∈ Γ we let α tγ be a binary indicator, with α tγ = 1 if and only if shift type γ overlaps time period t. We will choose which shift types to use, and then schedule them. We have a minimum number m of care workers which must be available in each time period, and a maximum number h max of working hours to allocate over the whole time horizon. For M we can choose a sufficiently large M h max / min Λ . At the end of the day we have a "night shift" with a fixed number of available care workers. The number and preferred start times of scheduled requests are known in advance, but the the unscheduled requests are stochastic, as are the durations of all requests. Requests that we fail to service by the end of the day are serviced by the night staff instead, and are penalised by virtue of minimization. We assume that few enough requests arrive during the night that the night staff are able to handle them. Therefore, should we wish to extend the time horizon over a number of days, the simulations for different days can be disaggregated.

To see that the NHSS problem is a special case of the queuing problem introduced in Section 4, note that as usual, the start times of the requests are determined by the number of care workers available in each time period. These levels are, in turn, uniquely determined by the shift schedule. The feasible region will be the set of care worker levels that constitute legal shift schedules. As usual, formulating an integer program which captures the stochasticity of the unscheduled requests, and the inherent queuing structure of the problem, would be challenging, so we will use the Branch-and-Simulate approach described in Section 4 instead. The MP is specified in Appendix B, and we describe the results of some computational experiments in §6.2.

Since the objective of the NHSS problem is the raw total delay of the jobs, a natural heuristic approach is to allocate care workers in proportion to the total work over time, or as close to it as we can manage. More precisely, suppose Y is the feasible region of the NHSS problem; that is, the set of worker level vectors which constitute legal a shift schedule (see Appendix B). For a given set S of scenarios, let r sj and ρ sj be the preferred start time and service time respectively of request j in scenario s. Then

is the average number of staffing hours needed to service all of the requests released in time period t. We can compute the vector y ∈ Y which minimizes the average absolute difference between y t and (23) over all time periods. This heuristic does not account for requests which are released concurrently or for requests which are backlogged into future time periods. Nevertheless, it provides us with a sensible starting solution to the master problem.

For the NHSS problem, we use fictional data that is based on the statistical analysis of nursing home operations provided in van Eeden et al. (2016) and Bekker et al. (2019b) . A work day consists of 16 hours from 7 AM to 11 PM, with the standard night shifts scheduled between 11 PM and 7 AM. In view of safety regulations, as well as standard practice, we require a minimum of m = 2 care workers scheduled at all times, with a (very safe) maximum of M = 20 care workers scheduled at a given time. There are two shift types: short shifts have a duration of 4 hours, while long shifts have a duration of 8 hours. A shift can start at every full hour. Thus we take Λ = {4, 8} as the shift types, and divide the day into intervals of length L = 60 minutes. We make use of a fictional dataset with 224 scheduled requests that follow the daily pattern of scheduled care requests used in Bekker et al. (2019b) . These have been made available as a csv file in the associated Github. Unscheduled requests can be of either the short type or the long type, and each request has an 80% chance of being short. Short requests are sampled from an exponential distribution with mean 1.89 minutes, while long requests use a mean of 9.28 minutes. Unscheduled requests follow a Poisson process with rate λ = 20. Later, to generate more challenging instances, we will also use a 50% short chance with a rate of λ = 5 (note that Bekker et al. used a 90% chance). Firstly, as we did with the ACCA problem, we will study the stability of the optimal solution with respect to the sample-average approximation. This time, however, we will vary the number of scenarios used. We generated 30 sets of |S| = 1, 10, 100, and 250 scenarios, for a total of 120 instances. Each instance was solved using S+In after warm-starting with the heuristic. As before, we afforded each job 3600 seconds, 5 gigabytes of memory, and a single thread. The CPU architecture is also identical. Figure 5 is akin to Figure 4 , where the horizontal axis indexes the y variables, while the thickness of a curve indicates the difference between the maximum and minimum values of the corresponding variable in optimal solutions. Optimizing over a single scenario expectedly produces considerable variance in the optimal shift schedule. And while 10 scenarios is better, optimizing over 100 scenarios produces a fairly stable solution. Increasing the number of scenarios to 250 provides little extra benefit.

Next we want to examine the optimization results in greater detail. To that end, we generated another 100 sets of |S| = 100 scenarios of the NHSS problem, which we also solved using S+In after warm starting with the heuristic. The results for the first instance, together with summary statistics, can be found Figure 6 .

On average, 28.64% of the total runtime was spent simulating solutions. The majority of that time occurs while generating initial cuts, with an average of only 0.54 seconds being spent in the callback. This is because the initial cuts alone are almost enough to solve most instances. Indeed, 41 of the instances were solved to optimality at the root node. We λ = 20, Short chance: 80%, S-In Figure 7 : Results for the same instances as Figure 6 , solved without initial cuts.

will examine the cause in a moment. For now, note that the apparent disparity between the time spent in the simulation, and the time spent generating initial cuts, comes from the overhead associated with adding 57000 constraints to the model. We solved the same instances without initial cuts, and tabulated the results in Figure 7 . Without initial cuts, more branch-and-bound nodes are explored while the lost ground is made up by Benders cuts. In this case it is clear that the initial expenditurerequired to generate initial cuts is worthwhile. Nevertheless, all instances can still be solved in short order with the strengthened Benders cut. To make the instances more challenging, we quickened the rate of unscheduled requests to λ = 5, and decreased the probability of short requests to 50%. Results for 100 new instances are tabulated in Figure 8 .

Injecting longer, more numerous requests into the data increases the solution times, the number nodes explored, and the number of Benders cuts generated. This is because working hours are scarce, so adding extra work typically lengthens the backlog of delayed jobs. Therefore the size (that is, |L|) of the Benders cuts increases. Since wider cuts are Figure 8 : Results for 100 new instances with the rate of unscheduled requests quickened to 5, and the probability of a short request decreased to 50%. The average objective values disagree because 12 instances were not solved to optimality within the time limit without initial cuts.

weaker, more cuts are required overall, and the solution time suffers.

In this paper we have proposed a novel approach to solving certain complicated optimization problems, which we call Branch-and-Simulate. The idea is to simulate the performance of each incumbent solution to a relaxed model. The simulation data then informs the trajectory of the optimization model itself, via Benders cuts. Strong Benders cuts ensure that we converge to an exact solution to the original optimization problem, and that the number of simulations we actually need to run is relatively small. We applied Branch-and-Simulate to a class of stochastic resource allocation problems with monotonic performance measures. In particular we looked at two concrete special cases; an airport check-in counter allocation problem, and a nursing home shift-scheduling problem. The majority of existing optimization models in these areas are characterized by the following features:

1. The stochastic nature of the optimization problem is accounted for by including estimates of the underlying random variables as model parameters, and 2. The inherent queuing structure of the problem is only approximately modelled.

An advantage of the existing models was that they can be solved with standard techniques. What they fail to account for, however, is the non-trivial variance of the underlying distributions. Optimizing over a set of samples large enough to capture this variance greatly increases the difficulty of the problem. And even for few scenarios, trying to model the queuing structure exactly leads to an intractable problem. Branch-and-Simulate, on the other hand, can be used to solve these problems to optimality with as many as 100 scenarios, in a reasonable amount of time. As we saw in §5.2 and §6.2, for realistic instances, we can incorporate the queuing structure exactly, and still quickly obtain stable, accurate solutions to the stochastic problem. This is true even after modifying the parameters to make the instances more challenging. For the NHSS problem, the most challenging instances were solved, on average, in only 109.07 seconds, leaving scope to apply the model to much larger nursing homes in the future.

We should not overlook the fact that our computational results were achieved despite our imposition of significant limitations on the computational resources. First of all, all computations were performed using a single thread. Since the simulation itself is independent of the optimization model, there is significant scope for parallelizaiton, especially of the initial cuts. In addition, the simulations were coded from scratch in Python. Further speed-ups could be obtained by using a faster, compiled language, and better hardware.

We summarize some opportunities for future research.

Additional Constraints. As we have already noted, the Branch-and-Simulate approach proposed does not depend on the structure of the feasible region. To avoid distracting the reader from the primary message-that is, the estimation of complicated performance measures via Benders cuts-we have omitted some potentially interesting details. Consider the NHSS problem. It may be desirable to limit the number of four hour shifts used. We could also consider a finer subdivision of the time horizon, say into 15 or 30 minute intervals. We were able to solve instances with the time horizon extended to fourteen days, but this is only interesting if the scheduled request patterns are substantially different on different days. In the context of the ACCA problem, it may be desirable to impose additional constraints pertaining to the structure of the queue, such as maximum queue lengths or queuing times. During a simulation it is easy to detect if qualitative constraints like these are violated by the current solution. We can eliminate these solutions using feasibility cuts. We can even do much better than a no-good cut, since we know that at least one more check-in counter will need to be opened in the set L prior to the problematic time period. It would be interesting to pursue this in more detail.

Multiple Classes. For ease of exposition, we have considered only one class of objects, one class of resources, and one family of performance measures at a time. Thanks to the generality of (4), extending the approach to multiple classes is straightforward.

Conditional Value at Risk. The sample-average approximations we used give uniform importance to every scenario. It may also be interesting to optimize the conditional value at risk (CVaR). Let w 0 , w 1 > 0 be real numbers with w 0 + w 1 = 1, and let β > 0. Define auxilary variables CV, α, and u s for s ∈ S. The master problem becomes min g(y) + w 0

When the θ variables correctly estimate the performance measures in a master optimal solution, the objective value becomes a weighted average of the sample mean and the β-Conditional Value-at-Risk, Rockafellar and Uryasev (2000) .

Approximately Monotonic Performance Measures. The Benders cuts and initial cuts derived in Section 3 are valid because the performance measures they estimate are monotonic. The question arises: what if the cuts are only approximately monotonic? In other words, what if adding more resources can occasionally worsen the solution? We considered a problem of this form; namely, an ambulance location problem, and the results can be found in Appendix C. While the cuts used are not formally valid, with Branch-and-Simulate, we are still able to obtain a strong approximation. We strengthen this claim by solving a set of smaller instances by brute force, and confirming that the error is negligable. Although this does not prove the claim for the larger instances, for small instances at least, the error is of a lower order than that inherent in the sample-average approximation.

Logic-based Benders decomposition, unlike its classical counterpart, does not yet enjoy the status of being a "standard technique." We believe this paper constitutes important progress in this direction. Branch-and-Simulate allows us to feed simulation data into an optimization model using Benders cuts. We believe there is significant scope for applying Branch-and-Simulate to other optimization problems, where incorporating certain complicated structures directly would render the model intractable. In particular, we hope practioners will be able to apply our theoretical development to other resource allocation problems with monotonic performance measures.

The parameters for the ACCA problem are summarized in Table 1 , and the master problem is given by (29 to 36). 

The objective function, (29), is the combined cost of opening check-in counters, and of passenger queuing time. Constraints (30) and (31) 

The parameters for the NHSS problem are summarized in Table 2 , and the master problem for the NHSS problem is (37 to 45). 

The objective minimizes the delay of all requests. Constraint (38) ensures that at most h max working hours are allocated over the time horizon. Constraint (39) ensures that the staff levels represent a valid shift schedule. The remaining constraints specify the variables.

In this section we will describe an ambulance location problem. This was omitted from the main body of the paper because we realized the performance measures lack the monotonicity property required for the Benders cuts to be valid. But since monotonicity violations are relatively pathological, we expect Branch-and-Simulate to provide a strong approximation. In §C.1.5 we provide very preliminary computational results. The model is tested on synthetic call logs over randomly generated graphs. In the future we would like to benchmark the approach against existing methods using real data sets. For now our goal is only to convince the reader that Branch-and-Simulate may still be a good idea if your performance measures only approximately satisfy the conditions which make the Benders cuts valid.

In many life-threatening situations, the most important factor for survival is the response time of emergency services. Therefore the proportion of calls which are reached within a specified threshold time is a natural measure of the performance of an emergency medical services system. This proportion is influenced, among other things, by the number of ambulances allocated to each station, which leads us to pose a resource allocation problem.

Many ambulance location problems are formulated as covering problems. The Set-Covering Location Problem was introduced by Toregas et al. (1971) and asks for the minimum number of ambulances, and their locations, which cover all demand points. The Maximal Covering Location Problem was introduced by Church and ReVelle (1974) and asks us to maximize the number of covered demand points using a fixed number of ambulances. These models require many simplifying assumptions. In particular, they do not account for the stochastic nature of ambulance availabilities. In reality, when an ambulance is dispatched to a call, it is not available for other calls for some amount of time. Failing to account for this causes those models to overestimate the quality of the solutions they produce.

An early example of a probabilistic model is the Maximum Expected Coverage Location Problem, introduced by Daskin (1983) . He extends the maximal covering formulation by introducing the notion of a busy fraction. When an ambulance is dispatched to service a call it is not able to service other calls for some period of time. The busy fraction is the probability that an ambulance is unable to service a demand at a given time. This approximates the stochastic element of the problem, while preserving our ability to solve the problem with standard techniques.

Various approaches to computing the busy fraction have been proposed in the literature. Unfortunately, the busy fraction can be difficult to estimate without its own simplifying assumptions, such as homogeneous call data. Moreover, the the busy fraction itself depends on the number and location of the ambulances. Ingolfsson et al. (2008) propose an iterative approach which involves updating the busy fraction after each sequential solution is found. Sung and Lee (2018) propose a scenario-based approach to solve an ambulance location model where the number of ambulances is fixed. They propose a two-stage stochastic programming approach with integer recourse. Instead of modeling ambulance availability using a busy fraction, they compute the exact dispatch and return times for each scenario. They propose a Benders decomposition approach with L-shaped cuts of van Slyke and Wets (1969), but themselves note that these cuts are too weak to optimally solve the problem. This is due to the weak LP relaxation of the subproblems. Therefore they propose several aporoximation schemes. formulate a stochastic integer programming model which incorporates two types of ambulances and multiple call priorities. They sample arrival scenarios directly from historical call logs, and so avoid distributional assumptions. They propose a Benders decomposition approach with cuts based on the LP relaxation of the dispatchment subproblem. Yoon and Albert (2021) consider a dynamic model, where ambulances are assigned to patients in real time via a Markov decision process, instead of a static dispatching rule. The literature for ambulance location and dispatchment models is extensive, so we refer the reader these papers for more detailed reviews.

Many ambulance dispatchment models rely on Markov processes to model system performance. These models often assume that calls arrive according to homogeneous Poisson processes, and that service times are exponentially distributed. Like Sung and Lee (2018) and , our approach does not require this assumption, because once the set of scenarios has been generated, the problem becomes deterministic. We will use classical probability distributions to generate synthetic test instances, but we stress that this is not necessary from a theoretical or practical point of view. Historical call logs are preferrable in practice, but we were unable to source such data due to privacy considerations.

The precise problem we consider is as follows. We are given a finite set V of demand points, and a set Ω of stations, which are candidate locations for ambulances. We have a metric d on the set V ∪ Ω of locations, which represents travel time. Over a finite time horizon-say a day-calls to emergency services will be made from the demand points. When a call is made, an ambulance is dispatched from a station to service the call. That ambulance will not be available to service other calls until it returns to its home station. We have an upper bounds M on the number of ambulances which can be allocated to a station, and a fixed fleet of M 1 ambulances. We are given a positive number δ, which is the target arrival time for calls. The task is to allocate ambulances to stations, in order to minimize the expected number of calls which we fail to reach on time. In this paper we consider only one type of ambulance, and one type of call, though adding more classes of calls and ambulances is an easy extension.

For each call, there is a random delay before the dispatched ambulance will leave. When an ambulance arrives, there is a random on-scene time. Each call also has some probability of requiring transport to a hospital. The locations of the hospitals are known, but the an ambulance spends dropping the patient off at a hospital, before returning to its home station, is also random. To account for the stochastic elements of the problem, we will, as usual, compute a sample-average approximation over a set S of scenarios. A scenario s ∈ S consists of a list of calls. For each call we sample of each of the relevant distributions; that is, the time and location of the call, the pre-trip delay of the ambulance, whether or not that call requires a hospital, and the time the ambulance will spend at the hospital before returning. Since the scenarios are sampled in advance, our approach does not depend on the specific distibutions of these parameters. To get our performance measures, for each scenario we will aggregate the calls according to their closest station, with respect to the metric d described above.

Whether or not we reach a call within the threshold time depends not only on the relative allocation of the ambulances, but on how we choose to dispatch ambulances to calls. A basic dispatching rule which is common in practice is the "closest available" rule, which dispatches the nearest idle ambulance. It is easy to show that this does not satisfy the monotonicity condition.

Example 6. Let V = Ω = {0, 1}. In other words, there are two demand points, both of which are also stations. For simplicity assume that there are no delays. Suppose two calls arrive at demand point v = 0, at times t 1 and t 2 > t 1 , respectively, and let y 1 = (1, 0) be the current solution. Suppose the on-scene time of the first call is ε > t 2 − t 1 , so that the second call is made before the ambulance finishes servicing the first call. If ε < δ, the closest available dispatching rule with y 1 reaches both calls within the threshold. Now consider y 1 = (1, 1), with an additional ambulance at the second station. Closest available dispatch dictates that the second ambulance should be dispatched to the second call when it is made. But if the distance between the two stations is sufficiently large, we will fail to reach that call in time. In other words, adding an ambulance made the performance measure for the first station worse. ♦ Instead of dispatching the closest ambulance, suppose we dispatched whichever ambulance will reach the call first. We know the delays in each scenario, and the current availability of ambulances before each call. So when a call is made we can determine which station would be able to reach the next call first, given the current activity of its ambulances. We then dispatch the next-available ambulance from that station, as soon as it becomes available. We refer to this as an "earliest arrival" dispatching rule. While this resolves the previous example, it is still possible to derive instances where adding ambulances worsens a performance measure. Nevertheless, we will still apply the Branch-and-Simulate approach of this paper to this problem, with the expectation that this will result in some errors. For a fixed allocation of the ambulances we can calculate the response times-and therefore, the performance-for each scenario during a discrete event simulation. We coded the simulations from scratch in python.

The master problem for the ambulance location problem is simple:

subject to ω∈Ω y ω = M 1 ,

θ sω ∈ R + s ∈ S, ω ∈ Ω (50)

z ξω ∈ {0, 1} ξ ∈ N, ω ∈ Ω.

viii Combining Optimization and Simulation using LBBD Algorithm 4 ALP Cut generation 1: L ← {ω} 2: i ← 1 3: while f sω (∆(L,ȳ)) < f sω (ȳ) do 4:

L ←− L ∪ {ω ∈ L : ∃v ∈ p(ω) with p i (v) = ω } 5:

i ← i + 1 6: for ω ∈ L \ {ω} do in a last-in-first-out manner 7:

if f sω (∆(L \ {ω },ȳ)) = f sω (ȳ) then 8:

L ← L \ {ω } 9: return L We have introduced a new variable, θ sω , which estimates the number of calls in scenario s whose first preference station is ω that we fail to reach on time. Constraint (47) fixes the fleet size. The rest of the constraints specify the variables (note that N = {0, 1, . . . , M }). After generating initial cuts, we solve the master problem with the callback routine specified in Algorithm 2. It remains to specify how we will generate initial cuts and Benders cuts.

For this class of problems, there are a variety of natural ways to construct neighborhoods. Note that for each demand point we can sort the stations in order of distance, with respect to the metric d mentioned above. For v ∈ V and i = 1, . . . , |Ω|, let p i (v) be the i th preference station for v. For ω ∈ Ω, let p(ω) = {v ∈ V : p 1 (v) = ω}. If all of the ambulances allocated to station ω are busy, then calls which arrive at nodes v ∈ p(ω) might need to be served by more distant stations. So before we solve the master problem, we will add initial cuts using all pairs (s, ω), (s, l), such that there exists a demand point v ∈ V whose first preference station is ω, and whose second preference station is l. When generating a Benders cut on θ sω , we initialise L = {ω}, and in the i th iteration, add the (i + 1) st preference stations of all demand points whose first preference station is ω. We will then remove stations one at a time while the required equality is still preserved. The proceedure is described in Algorithm 4.

We tested our Branch-and-Simulate approximation on randomly generated undirected graphs in the plane. We begin by initializing n random points in a square segment of the plane such that each node is a minimum distance from each other node. The travel time on an edge is its Euclidean length. We have a maximum degree for each node. All edges are added to the graph, one at a time, if they are shorter than a fixed threshold, and will not violate the maximum degree condition. We then check if the graph is connected. If it is not, then we increase the threshold by one and try again. We use the minimum distance for the initial threshold. We continue iteratively until we obtain a connected graph. Then K nodes are randomly chosen to be stations, and E stations are randomly chosen to be hospitals as well.

Requests arrive at nodes following independent Poisson processes. The arrival rate of calls at a node v is equal to 360 − 10deg(v), where deg(v) is the degree of that node. We use a time horizon of 720 minutes. The pre-travel delays are lognormally distributed with a mean of 3.25 (195 seconds) and a standard deviation of 1.6 (96 seconds) following Ingolfsson et al. (2008) . The remaining paremeters have been chosen by us. The average on-scene time for calls is 10 minutes. The probability that a call requires transport to a hospital is 20% and the average time spent dropping patients off at a hospital is 3 minutes. On-scene and hospital times are exponentially distributed. In practice it would be better use parameters based on a statistical analysis of the specific geographic region, or to sample from historical call data.

For the basic experiment we generated graphs with n = 100 nodes, K = 40 stations, and E = 10 hospitals, in a city of size 40 × 40 with minimum node distance 2 and maximum node degree 5. The response time target is δ = 9. It is well-known among combinatorialists that the number of feasible solutions to this problem is equal to the coefficient of x M 1 in the polynomial, (1 + x + · · · + x M ) K where K is the number of stations. We will allow at most M = 1 ambulance per station, which means the problem is that of choosing which stations will have ambulances, and which stations will not. We can compute the coefficient of x M 1 in (1 + x) 40 by expanding the expression. This is done automatically in our Python code. Note that the instances grow more challenging as K increases and as M 1 approaches K/2. We generated 20 instances for each choice of M 1 = 25, 26, 27, 28, and 29 ambulances and tabulated the results in Figure 9 .

Branch-and-Simulate obtains small MIP gaps within the time limit. It does so despite simulating only a very small proportion of the total number of feasible solutions. Since we cannot be sure that the model objective value is accurate, we calculated the true objective value using the simulation. That the model erros are small suggests that the solutions may be near optimal despite the invalid Benders cuts. But in principle, the cuts might have eliminated a significantly better feasible solution. To show that this is unlikely, we also solved a set of smaller instances by brute force, and compared them to the model solution. We reduced the size of the city to 50 nodes in a 25 × 25 segment of the plane. We have K = 20 stations and M 1 = 10 ambulances to locate. To reduce the computational burden of the brute force approach, we reduced the number of scenarios to |S| = 10. After solving these instances with Branch-and-Simulate we then simulated the performance of every feasible solution. The results are tabulated in Figure 10 . We found that for these small instnaces,

Optimizing the airport check-in counter allocation problem

Demand-driven task-scheduling in a nursing home setting: A genetic algorithm approach

Keeping pace with the ebbs and flows in daily nursing home operations

Partitioning procedures for solving mixed-variables programming problems

A mathematical model for the optimization of the airport check-in service problem

A decision support system to improve performances of airport check-in services

The Maximal Covering Location Problem

A maximum expected covering location model: Formulation, properties and heuristic solution

Stochastic planning and scheduling with logic-based Benders decomposition. arXiv e-prints, art

Solving a stochastic facility location/fleet management problem with logic-based Benders decomposition

Generalized Benders decomposition

Logic-based Benders decomposition and binary decision diagram based approaches for stochastic distributed operating room scheduling

Gurobi Optimizer Reference Manual

Logic-based Methods for Optimization: Combining Optimization and Constraint Satisfaction

Logic-based Benders decomposition for large-scale optimization

Logic-based Benders decomposition

Optimal ambulance location with random delays and travel times. Health care management science

Mathematical formulations for large scale check-in counter allocation problem

The integer L-shaped method for stochastic integer programs with complete recourse

Task scheduling in long-term care facilities: A client-centered approach

Stochastic allocation and scheduling for conditional task graphs in multi-processor systems-on-chip

Scale and skill-mix efficiencies in nursing home staffing: inside the black box

On the performance of small-scale living facilities in nursing homes: a simulation approach

The Benders decomposition algorithm: A literature review

Optimization of conditional value-at-risk

Scenario-based approach for the ambulance location problem with stochastic call arrivals under a dispatching policy

The location of emergency service facilities

Check-in computation and optimization by simulation and IP in combination

Care on demand in nursing homes: a queueing theoretic approach. Health care management science

L-Shaped linear programs with applications to optimal control and stochastic programming

Dynamic dispatch policies for emergency response with multiple types of vehicles

A stochastic programming approach for locating and dispatching twotypes of ambulances

Acknowledgments M Harris is supported by an Australian Government Research Training Program Scholarship. HM Jansen and T Taimre were partly funded by Australian Research Council (ARC) Discovery Project DP180101602. We thank Marko Boon for his help in starting up this research project. We are grateful to René Bekker and Dennis Moeke for sharing a fictional data set.

 Figure 9 : Results for the ambulance location experiment with the number of ambulances to allocated ranging from 25 to 29. We report on the means and averages over 20 instance for each choice of parameters. We report on (i) the objective values, (ii) the MIP gap obtained, (iii) the true objective value of the model solution, obtained using the simulation, (iv) the error in the model solution, (v) the total optimization time, including initial cuts, and (vi) the number of simulations as a proportion of the number of feasible solutions. To better represent the gaps and errors, we also also report on the number of instances with non-zero optimality gaps after the time limit.the error which comes from using invalid Benders cuts is of a smaller order than the error inherent in a sample-average approximation approach. We leave a more systematic and realistic study of ambulance location problems for future research. Figure 10 : Results for instances small enough to solve by brute force. We report on (i) the model objective, (ii) the MIP gaps, (iii) the true optimal objective value obtained by brute force, (iv) the error in the brute force solution, (v) the total optimization time, and (vi) the total brute force time.