key: cord-0559642-3cu3x05o
authors: Mai, Tien; Sinha, Arunesh
title: Safe Delivery of Critical Services in Areas with Volatile Security Situation via a Stackelberg Game Approach
date: 2022-04-25
journal: nan
DOI: nan
sha: 1298ff3ae661ed8c2f182613f8124faed4e28594
doc_id: 559642
cord_uid: 3cu3x05o

Vaccine delivery in under-resourced locations with security risks is not just challenging but also life threatening. The current COVID pandemic and the need to vaccinate have added even more urgency to this issue. Motivated by this problem, we propose a general framework to set-up limited temporary (vaccination) centers that balance physical security and desired (vaccine) service coverage with limited resources. We set-up the problem as a Stackelberg game between the centers operator (defender) and an adversary, where the set of centers is not fixed a priori but is part of the decision output. This results in a mixed combinatorial and continuous optimization problem. As part of our scalable approximation of this problem, we provide a fundamental contribution by identifying general duality conditions of switching max and min when both discrete and continuous variables are involved. We perform detailed experiments to show that the solution proposed is scalable in practice.

Vaccine delivery has always been a challenge in under-resourced parts of the world Zaffran et al. (2013) . The problem is further aggravated by the threat of violence against vaccine providers in areas where the security situation is volatile Gannon et al. (2020) . A safer vaccine delivery plan for such places can save lives, both via vaccination of the general population and physical protection of the front line vaccine providers. Specifically, vaccination drives in underdeveloped areas are often performed by setting up a limited number of temporary vaccination centers. Motivated by this issue, we propose and study a general framework to set-up such temporary centers that balance physical security requirements of the centers and achieve desired vaccination coverage.

Our first contribution is a flexible model that allows choice of a small subset of centers to operate, along with a consideration of how to allocate security resources to the operational centers. Further, the framework allows for fairness constraints that ensure fairness in center allocation for different geographical regions as well as fairness in security allocation to operating centers. The model is set-up as a Stackelberg game between the centers operator (defender) and a bounded rational adversary who follows the Quantal Response (QR) model. The model takes inspiration from Stackelberg security games Tambe (2011) in the manner in which security resources are allocated and the center selection aspect is inspired by product assortment and pricing problems Wang (2012). While inspired from these models from disparate areas, we believe this is a first model that brings together security issues and subset of centers (targets) selection within one framework.

Our second contribution is a hybrid algorithm that combines the best of two different approaches. The first of these two approaches is a Mixed Interger Linear Program (MILP) with guaranteed approximation, and the second is a polynomial time heuristic. All algorithms work by a sequence of modifications to the original optimization problem, starting with a binary search on the objective value. The first algorithm exploits properties of the problem at hand to convert a bi-linear MIP formulation to a MILP with guaranteed approximation, but, MILPs are not scalable in practice. Hence, we design a heuristic where we use Lagrangian duality and reach a sub-problem which is a max-min-max tri-level optimization with discrete variables for the outer max problem and continuous variables for the two inner min-max problems.

Then, in a fundamental technical contribution, we identify general conditions for minimax equality when the variables involved are discrete and continuous.

While the conditions do not hold in rare cases for our problem, we use this result as a heuristic to transform our sub-problem to a min-max-max problem and then we present a polytime approximation for the transformed problem. The polynomial time approach provides for immense scalability in practice, with solutions close to the MILP approach in almost all cases. Finally, we show that the MILP and heuristic can be made to work together in a scalable hybrid approach with approximation guarantees for the output solution.

We conduct thorough experiments to analyze various aspects of our three approaches. We show that our main hybrid algorithm is scalable in practice (solving for up to 5000 potential centers within 1.5 minutes) and also much better than competing baseline approaches. While we are inspired by vaccine centers operation, our model and ideas can be applied to critical services problems such as operation of temporary health camps, exam centers, etc. in underdeveloped and security risk-prone areas or even existing security games work where a subset of targets can be chosen to be made unavailable. All missing proofs are in the appendix.

To summarize, our main contributions are:

1. A novel and flexible security game model that brings together elements of choosing a subset of targets to operate and security considerations under one umbrella.

A fundamental minimax equality result with discrete and continuous variables that is of independent interest in optimization research.

3. Our algorithm that uses the above result as well as exploits other properties to solve the game approximately in polynomial time.

showing the much superior scalability of our algorithm against baselines.

Our work takes inspiration from two main distinct topics: Product Assortment and Pricing. Our problem is most closely related to the problem that combines product assortment and price optimization problem in which a subset of products is chosen from an assortment of products and simultaneously the prices of products is chosen with the goal of maximizing profit in a market where the buyers choose a product following the QR Wang (2012) model. The relation to our problem stems from the analogy of set of products to set of centers and of continuous prices to security allocation. However, prior work Wang (2012) solves an unconstrained optimization (no constraints on prices), whereas we deal with a constrained optimization.

Other works in this area optimize just the product assortment with fixed prices Davis et al. (2014); Désir et al. (2020) . In other works (still fixed prices), different models of buyers have been considered such as rational best responding buyers Immorlica et al. (2018) and other discrete choice models Davis et al. (2014); Gallego et al. (2015) . More further away, there is work on an online version of the product assortment problem using a multi-armed bandits formulation Agrawal et al. (2016) ; Cheung and Simchi-Levi (2017) but again with fixed prices.

Stackelberg Security Games. There is a large body of work in game theoretic models and algorithms for physical security Tambe (2011) of a given fixed set of vulnerable targets, which have also been deployed in real world Tambe (2011). Distinct from prior work, our problem requires choosing which centers (targets) to operate, which yields a novel mixed combinatorial and continuous optimization problem. In addition, we improve upon a prior result Yang et al. (2012) , that used piece wise linear approximation (PWLA) to solve a sub-sub-problem that arises in our analysis. We show that this subsub-problem can be computed in closed form.

Recent work explores the complexity of quantal response players in more generality for Stackelberg games Cerny et al. (2020); Milec et al. (2021) calling the equilibrium as Quantal Stackelberg Equilbrium (QSE). Our problem differs because of an additional combinatorial dimension (operate a subset of centers) of the defender's action space. Moreover, in contrast to the hardness results in these work, we obtain arbitrarily precise approximation for our problem in polynomial time. Thus, we identify a sub-class of games where the QSE is efficiently approximable to arbitrary precision, even with a complex defender action space.

Facility Location. Our problem of which vaccine centers to operate might seem like the maximum capture problem in competitive facility location Benati and Hansen (2002); Freire et al. (2016) ; Mai and Lodi (2020) . However, our problem is very different, as we consider security for centers and have no consideration of capturing market share and abstract away from fine-grained logistic modelling. There are many variants of the facility location problem Procaccia and Tennenholtz (2013) et al. (2020) . However, as far as we know, there is no work that considers providing security to facilities as an optimization criteria and most work consider only binary decision variables. The interplay between discrete and continuous optimization is an important aspect of our work.

We model the stated problem as a general sum Stackelberg game between a defender and a QR adversary. The defender operates a subset of centers (which changes at set frequency, e.g., weekly) and allocates security resources to operational centers. The adversary's attacks an operational center. Action spaces. The defender has a candidate set of potential centers, denoted by K. The variable S, where S ⊆ K, denotes the defender's choice of centers to operate. However, not all subsets are feasible; F (K) ⊂ 2 K is the feasible set of sets of centers. Suppose every center can vaccinate at least P min people. Two natural restrictions are that for every S ∈ F (K) to have |S| ≤ C and P min |S| ≥ N P P min for some integer constants C, N P , capturing a budget constraint and a minimum number of N P P min people to be vaccinated every round. In addition, we consider another natural constraint that the candidate center set K is partitioned into K 1 , . . . , K L such that any feasible choice S ∈ F (K) contains at least one location from each partition l ∈ {1, . . . , L} (L < C), that is, |K l ∩ S| ≥ 1 for all l. The set K l contains the possible operable center locations in a contiguous geographic region l; hence, this constraint captures fairness in the choice of allocation of vaccine centers across different geographical regions. We call this fairness in vaccine center allocation (FVCA).

Continuing with the defender's action description, the defender also allocates security resources to the centers S ∈ F (K) that are chosen to be operational. The number of security resources is fixed and de-noted by m. The defender's pure strategy is to allocate m security resources to the |S| operational centers. The mixed strategy is represented succinctly by x S = x j j∈S (x S ∈ D x S = [0, 1] |S| ), which denote the marginal probability of defending the |S| centers that are chosen to operate. Further, in order to provide a fairness in security allocation (FSA), we impose the constraints: j∈K l ∩S x j ≤ β l for all l and some given constants β l . In words, these constraints imposes an upper bound β l on the chance of protecting facilities that operate in region l thereby ensuring that no geographic region is given unusual preference in terms of protection. E.g., we could have β l ∝ |K l | |K| m with the proportionality constant greater than 1.

Overall, the defender's action is (S, x S ) with constraints as stated above. The adversary's pure strategy is to choose one among the |S| operational centers to attack; locations in K\S have no operational center and cannot be attacked.

Utilities. For every potential center j ∈ K, if the center is operating and the adversary attacks j and the center is protected then the defender obtains reward r d j and the adversary obtains l a j . Conversely, if the defender is not protecting the operating center j, then the defender obtains l d j (r d j > l d j ) and the adversary gets r a j (r a j > l a j ). Given x j , the expected utility of the defender and attacker for an attack on an operational center j is as follows: U d j (x j ) = x j r d j + (1 − x j )l d j and U a j (x j ) = x j l a j + (1 − x j )r a j . For ease of notation, we note that these utilities are linear in x j and hence we rewrite these as

where w a j = r a j − l a j ≥ 0 These utilities are valid for location j ∈ K only when j has an operational center, that is, j ∈ S. Note that S is a decision variable and not fixed apriori.

We use the QR model for the adversary's response. QR is a well-known model McFadden (1976) ; McKelvey and Palfrey (1995) , and used extensively in Stackelberg security games. The QR model is essentially a soft version of the rational best response model. Specifically, QR posits that the adversary will attack an operational center j ∈ S with probability:

Parameter λ ≥ 0 governs rationality. λ = 0 means least rational, as the adversary chooses its attack uniformly at random and λ = ∞ means fully rational (i.e., attacks a center with highest utility). Thus, QR has the flexibility to model a range of behavior. Then, defender's expected utility for (S, x S ) is

Stackelberg equilibrium. The Stackelberg equilibrium (also called Quantal Stackelberg Equilibrium in Cerny et al. (2020)) can be computed using the following optimization:

Constraint (2) states that there can be at most m security resources. Constraint (3) captures the FSA fairness criteria. The S ∈ F (K) in the subscript of max captures the three constraints |S| ≤ C, |S| ≥ N P , and FVCA fairness criteria. The objective F(S, x S ) is the expected utility of the defender.

Comparison to Stackelberg security games with QR adversary. Our model above is distinct from the standard security game with QR adversary Fang et al. (2017) in the aspect that we allow the defender to choose a subset of targets that then are part of the game and other targets are not part of the game. A naive exploration of all subsets is clrealy computationally infeasible, and next we present approaches to address this problem.

As stated in the introduction, we present two approaches, one based on a MILP formulation with solution quality guarantees (in this section) and another based on a polynomial time heuristic in Section 4. We further combine these two approaches into one hybrid approach in Section 5 that provides approximation guarantees for the solution, as well as similar scalability as the heuristic.

We start with a transformation that is used in all our approaches. We use the Dinkelbach trans-Algorithm 1: Binary Search Template form Dinkelbach (1967) to convert the fractional objective of EqOPT to a non-fractional one. We use the shorthand notation q j (x S ; λ) = N (x j )/D(x S ) to express Eq. 1 succinctly (as λ is obvious, it is not stated explicitly). By definition, D(x S ) = j∈S N (x j ). Hence the objective of EqOPT can be written as

The Dinkelbach transform works by seeking to find the highest value of a threshold δ such that there exists some feasible S, x S such that j∈S

The highest possible δ can be computed by a binary search where in each round of the search the feasibility problem stated above is solved for a particular δ 0 . For given δ 0 , the feasibility problem is easily solved by checking if the maximum of the following optimization is ≥ 0 or not

subject to Constraints (2-3)

While the above is a common technique, for completeness, the binary search template over δ 0 is shown in Algorithm 1, where BOPT is solved repeatedly (line 4) till convergence. We present different ways of solving BOPT in the sequel for our different approaches. However, all our approach of solving BOPT are approximate, making Algorithm 1 a binary search with inexact function evaluation, which necessitates the following result:

Theorem 1. Suppose BOPT is computed with additive error of ξ and runtime T in line 4 of Algorithm 1. Let f (δ) be the optimal value of BOPT for δ. We In the rest of this paper, all our approaches will focus on approximately solving BOPT (line 4) in Algorithm 1. The additive approximation guarantee of each approach can directly be used as ξ in the above theorem to obtain the overall approximation guarantee of that approach.

The BOPT optimization objective is separable in x j 's. We exploit this and use a piecewise linear approximation (PWLA) to obtain a non-linear integer program. We follow a recipe similar to prior work Yang et al. (2012), and divide the range of x j , i.e., [0, 1], into K equal intervals and represent each

and r jk = 0 otherwise. However, in contrast to Yang et al. (2012) we need additional binary variables θ j (θ j = 1 if j ∈ S, θ j = 0 otherwise) to represent center selection, which results in bi-linear terms of the form θ j r jk in the resultant MIP. This bilinear MIP is shown in the appendix. Bi-linear terms can be linearized using the well-known big-M approach Wu (1997). But, this naive linearization results in K|K| additional variables and 3|K|K additional constraints, making such approach not scalable at all. Next, we show that the naive bi-linear MIP can be formulated as a MILP with no additional variables and only |K| additional constraints, with upper bounds for the approximation error given in the result below.

Theorem 2. BOPT is approximated by the following

Let B(S, x S ) be the objective function of (BOPT) and S * , x * S be an optimal solution of (BOPT). Let (θ , z , r ) be an optimal solution of (ApxOPTL), which provides solution S ,

Using Theorem 1 and the above result, it can be shown that when solution of ApxOPTL is used in line 4 of the binary search of Algorithm 1, we attain an approximation of O( 1 K + ). However, the runtime of a MILP in the worst case is not polytime. Next, we propose a heuristic that is polytime but an approximation guarantee does not hold in rare cases.

Our starting point for the heuristic is problem BOPT. We aim to transform the problem further to remove constraints but the inner max problem in x S is not concave. However, a simple variable transform y j = e −λw a j xj makes the inner problem, now in y S , concave with the same optimal objective value. Then, we form the Lagrangian dual of the inner max problem (in y S ) with guaranteed same solution due to concavity and consequent strong duality. The op-timization after Lagrangian dualizing is

and D(y S ) = j∈S N (y j ) and ν, µ l l∈{1,...,L} are the dual variables. We aim to switch the outer max and min in DualOPT; first, we present a general result when such switching is possible.

We present a general result about conditions for minimax equality when discrete variables are involved. We note that, as far as we know, there is no such result in literature.

Theorem 3. Let f : X × Y → R be a function such that the following are true:

• X is a set with finitely many points.

• Y is a convex set in a Euclidean space.

• f (x, ·) is continuous and convex on Y .

• For all x ∈ X and all b, the sub-level sets of f (x, ·) given by {y | f (x, y) ≤ b} are compact and convex (if Y is compact, this condition is implied and can be removed)

Then

Proof. First, compact sub-level sets of a convex function implies that the minimizer exists. This can be seen easily by choosing a b such that sub-level set is nonempty and the minimizer over this sub-level set is same as global minimizer. As the sub-level set is compact, by Weierstrass extreme value theorem the minimizer always exists.

Then, following well-known convexity preservation transforms max x∈X f (x, y) is convex. Also,

Thus, sub-level sets of the convex function max x∈X f (x, y) are compact and hence a minimizer y * always exists.

For each x ∈ X, let y x denote any element in argmin y∈Y f (x, y).

Fix a y * ∈ argmin y∈Y max x∈X f (x, y). For this y * let x * be the unique value (as assumed in theorem) such that x * such that x * = argmax x∈X f (x, y * ).

We first show that y * ∈ argmin y∈Y f (x * , y) using proof by contradiction. Assume that y * / ∈ argmin y∈Y f (x * , y). Our assumptions in the theorem imply that

We must have δ > 0. On the other hand, from the contradicting assumption and def. of y

The convexity of f (x, y) in y implies that, for any ∈ (0, 1),

where d y = y x * − y * . We note that, since f (x * , y * ) > f (x * , y x * ) and f is continuous and y * + d y is the line between y * and y x * , there is a 0 such that for all

. Thus, plugging this into the above convexity derived result we get

which when rearranged and canceling out (1− ) gives

for any ∈ (0, 0 ). Since f (x, y) is continuous in y, we can always select ∈ (0, 0 ) small enough such that

where δ is defined in (4). Now, we have Figure 1 : Simple illustration of Theorem 3. Horizontal axis is the y-axis. Max functions are in bold.

f 1 , f 1 touch the origin.

where (a) is due to (6), (b) is due to the selection of δ in (4) and (c) is due to the selection of in (6).

The above set of inequalities show that x * is optimal to max x∈X f (x, y * + d y ). Thus, from (5) and the abobbe result we have

which is contrary to the assumption in the theorem that y * ∈ argmin y∈Y max x∈X f (x, y). So our contradicting assumption must be false. Thus, we should have y * ∈ argmin y∈Y f (x * , y). Now we know that there is x * ∈ X and y x * (in particular, choose y x * = y * ) such that (x * , y x * ) is optimal to min y∈Y max x∈X f (x, y). We have the following chain

In other words, (x * , y x * ) is an optimal solution to max x∈X min y∈Y f (x, y) and the minimax equality holds, i.e.,

We complete the proof.

We provide a simple illustration with 1d quadratic functions in Fig. 1 for better intuition of the above result. The exact functions used are stated in the appendix. Fig. 1a shows a situation where there is a non-unique maximizer (both 1, 2) of min y max(f 1 (y), f 2 (y)) for functions f 1 , f 2 (occurring at y = 0.75 with value f 1 (0.5) = f 2 (0.5) = 0.5625). Whereas, max 1,2 min y (f 1 (y), f 2 (y)), which is min of each function separately and then the max from among those, occurs at y = 1 with value 0.5 = 0.5625. Fig. 1b shows a situation where min y max(f 1 (y), f 2 (y)) is uniquely determined by f 2 and hence is same as max 1,2 min y (f 1 , f 2 ) .

Continuing with our aim of solving DualOPT, we switch the outer max and min in DualOPT to obtain SwitchedDualOPT:

This switching is a heuristic as the uniqueness condition in Theorem 3 may not hold for every possible parameter values of the problem at hand (other conditions hold); see also discussion after Lemma 1. In any case, SwitchedDualOPT provides tight bounds, which we use for a hybrid guaranteed approximate solution in the next section.

Solving for fixed duals. We first show a polytime arbitrary close approximation for the inner problem in SwitchedDualOPT. For any given fixed ν, µ ≥ 0, consider:

Observe that the objective φ(·) of FixedDuals is additively separable into terms g j (ν, µ, y j , δ 0 ) that depend only on the single scalar variables y j as follows:

We use this separability to solve FixedDuals to any precision in polytime in Algorithm 2. First, in lines (2-4) we maximize g j over y j for each j ∈ K to obtain h j (ν, µ, δ 0 ). With h j 's, the objective of FixedDuals takes the form max S∈F (K) j∈S h j plus constants νm + l µ l β l . Recalling the definition of F (K), we solve this by sorting the h j 's (line 5) and extracting the best h j values from each of the L partitions of Algorithm 2: FixedDualsSolver (ν, µ, δ 0 )

Let l be that index such that j ∈ K l 4 compute h j (ν, µ, δ 0 ) = max yj ∈[e −λw a j ,1] g j (ν, µ, y j , δ 0 ) 5 In list H, store the indexes j sorted by h j (ν, µ, δ 0 ) in descending order. 6 Partition H into H 1 , . . . , H L , such that any j ∈ H l satisfies j ∈ K l and each H l is still sorted by h j .

13 return S * = S o and the y * j 's that maximize h j locations (lines 6-8). This ensures at least one center in each partition is selected. Then, among remaining locations, we choose the best (as measured by h j ) centers (lines 9-12) ensuring at least N P are chosen. The final choice of centers is S o (line 13).

Next, we show that there is a closed form formula for h j in line 4 of Algorithm 2. This is a concave max optimization as g j is concave in y j . The closed form formula enables a much faster solving of prior security game models with QR adversary (no subset selection). The main result in Yang et al. (2012) was a piecewise linear approximation based optimization solution of the standard security games with QR adversary (no subset selction of targets), which we vastly improve upon by showing that the same can be computed to optimality using the closed form formula in the lemma below. In more details, we show that optimization in Yang et al. (2012) 

and W is the Lambert W function Corless et al. (1996) .

For further notational ease, let Φ(S, ν, µ, δ 0 ) = max y S ∈D y S φ(S, ν, µ, y S , δ 0 ). Using the solution y * S from the above lemma, we get that Φ(S, ν, µ, δ 0 ) = φ(S, ν, µ, y * S , δ 0 ). Note that y * S is still a function of ν, µ, δ 0 . It can be readily checked that Φ satisfies the conditions of Theorem 3 with Φ as f , and ν, µ as y, and S as x (δ 0 is a constant), except for the uniqueness of S * (x * in Theorem 3) for optimal ν * , µ * (y * in Theorem 3). However, observe that in Algorithm 2, the solution S * (for given ν * , µ * ) may be non-unique only if h j = h i for some i = j; the stringent equality needed makes non-uniqueness highly unlikely; indeed, we encounter non-uniqueness in only 1.1% cases in experiments. We prove the following:

Lemma 2. 1. Algorithm 2 runs in poly time O(|K| log |K|).

2. Algorithm 2 solves FixedDuals to any arbitrary fixed precision.

Gradient descent on duals. Next, in order to solve SwitchedDualOPT, we perform projected gradient descent (PGD) on dual variables ν, µ.

Recall that the inner problem in SwitchedDualOPT is max S∈F (K) max y S ∈D y S φ(S, ν, µ, y S , δ 0 ), which, with slight abuse of notation, we refer to as FixedDuals(ν, µ, δ 0 ), which we already solved in Algorithm 2 for given ν, µ, δ 0 . The use of PGD is justified by:

Proof. We again skip writing δ 0 for ease of notation. Using the notation introduced and Lagrangian duality, the inner Φ(S, ν, µ) = max y S ∈D φ(S, ν, µ, y S ) problem is convex in ν, µ. The function in the theorem is max S∈F (K) Φ(S, ν, µ). As this is a max over multiple convex functions Φ(S, ν, µ) S∈F (K) , this function is convex.

Combining all sub-results for the polynomial heuristic, in Algorithm 3 PGD is used in lines 3-4, where P ν,µ≥0 denotes projection to the space ν, µ ≥ 0. The gradient of FixedDuals w.r.t. ν, µ can be

S, y S = FixedDualsSolver (ν t , µ t , δ 0 ) 5 until objective changes by less than ξ 6 return S, y S computed using Danskin's Theorem Bertsekas et al.

(1998) (details in appendix). Algorithm 3, when plugged in as solver for BOPT in line 4 of the binary search Algorithm 1 is the full polynomial time heuristic.

Using Theorem 1 and Lemma 2, it can be readily seen that the heuristic has a runtime of O(log( 1 ) |K| log |K| ξ ) (gradient descent takes O( 1 ξ ) iterations under mild smoothness condition). Also, if the uniqueness condition of Theorem 3 holds for any problem instance then we would obtain O(ξ + ) approximation. However, as stated earlier, the uniqueness might not hold in rare cases and hence we next propose a hybrid approach combining the advantages of this heuristic and the earlier MILP approach.

This hybrid approach is inspired by the observation that the heuristic is time efficient and if the minimax equality hold, the resulting solution is optimal for EqOPT. If the minimax eaulity does not, we can still use solution of SwitchedDualOPT to construct tight lower and upper bounds for the MILP approach with guaranteed solution quality. Towards that end, we make use of the following result:

Theorem 4. If we run the heuristic (Algorithm 3 plugged in line 4 of Algorithm 1) and obtain solution (δ 0 , S, x S ), then with S * , x * S optimal for EqOPT there exists a small enough ξ (in Algorithm 3) such that (1) |F(S * , x * S ) − F(S, x S )| ≤ |δ 0 − F(S, x S )| + 2 and (2) F(S, x S ) and δ 0 +2 can be used as a lower and upper bounds for the MILP approach (ApxOPTL plugged in line 4 of Algorithm 1).

Theorem 4 implies that if δ 0 is sufficiently close to F(S, x S ), then (S, x S ) is a near-optimal solution to EqOPT. Using the above result, we design a hybrid approach combining the MILP and the heuristic to efficiently find a near-optimal solution in Al- gorithm 4. It can be seen from the above result that if the algorithm stops at line 3, then the returned solution is additive 3 -optimal to EqOPT, i.e., F(S * , x * S ) − F(S, x S ) ≤ 3 . Otherwise, if the algorithm stops at line 7, then a guarantee is already established via Theorem 2, i.e., F(S * , x * S )−F(S, x S ) ≤ O(1/K + ). So, it is guaranteed that a solution returned by Algorithm 4 is additive O(1/K + )-optimal to EqOPT. In line 6, we run the heuristic again to further improve the solution of the MILP. In experiments we show that in most of the cases, the hybrid algorithm stops at line 3, making it much faster than the MILP approach.

Our experiments are simulated over 10 random instances for each measurement that we report. In each game instance, payoffs are chosen uniformly randomly, from 1 to 10 for r d j and r a j and from -10 to -1 for l d j and l a j . Following past user studies Yang et al. (2011), the parameter λ of the QR model is chosen as 0.76. We select C = 2K/3 , N P = K/2 , m = |K|/10, and split the set of centers into L = 5 disjoint partitions of equal size. For each partition l, we choose β l = 2m/L. All experiments were conducted using Matlab on a Windows 10 PC with Intel i7-9700 CPUs (3.00GHz).

We compare our algorithm (Alg 4, denoted as Hybrid ) with two baseline methods. One is based on the convex optimization approach (denoted as Con-vexOpt) proposed in prior work Yang et al. (2012) ; this method is not capable of choosing which center to operate hence we run it with all the centers chosen as operational. The other method is based on a two- steps procedure (denoted as TwoSteps); for this we first solve EqOPT with fixed S = K (using Convex-Opt) to find a strategy x * . We then use this strategy to select at least N P and at most C centers from K by sorting the individual rewards {w d j x * j +l d j , j ∈ K} and selecting N P centers with highest rewards and then, from the remaining centers, selecting no more than C − N P centers with highest and positive rewards. Then, a subset S * is selected and we solve EqOPT with fixed S = S * to re-optimize the strategy.

We first determine the number of pieces K needed in our PWLA approach for a good approximation, noting that, as is typical for approximation algorithms, the bound in Theorem 2 can be too conservative for average case problems. We vary K in {5, 10, . . . , 30} and compute the percentage gap between the objective values given by PWLA with K pieces and with a large K = 200 pieces. Figure 2 plots the percentage gaps for |K| ∈ {20, 40, 60} and we see that the percentage gaps become relatively small (less than 1.4%) if K ≥ 20. Thus, we fix K = 20 for the rest of the experiments.

Expected rewards comparison. The means and standard errors of the expected rewards of different approaches are plotted in Fig. 3 , where in the left figure we vary the number of centers from 20 to 200 and set resource budget as m = |K|/10, and in the right figure we fix |K| = 50 and vary the resource budget m from 2 to 20. The expected rewards of SwitchedDual are equal to those of Hybrid for 178/180 test instances, and only slightly smaller for 2/180 instances. This shows that the minimax equality holds with high probability.

Also, Hybrid and SwitchedDual consistently outperform other methods. In particular, for varying number of centers, Hybrid provides 85%-96% larger rewards than TwoSteps. For varying resource budget, the improvements are up to 142%. ConvexOpt Computational scalability. In this experiment, we run Hybrid, SwitchedDual, ConvexOpt and MILP , where MILP refers to using ApxOPTL. We vary the number of potential centers from 50 to 500 with two settings, one with a fixed resource ratio as m = 0.1|K| and one with a fixed resource budget m = 20. Fig. 5 shows the means and standard errors of the CPU time of the four approaches over 10 repetitions. We also see that the curves given by the Hybrid and SwitchedDual are almost identical, indicating that Algorithm 4 mostly stops at line 3, i.e., the minimax equality holds. There are only a few instances of |K| = 200 or |K| = 350 that Algorithm 4 stopped at line 7, noting that even in these cases, the total time is still about 5 to 10 times less than the time required by the MILP, due to the tightness of the bounds provided by the SwitchedDual in line 5. Note that ConvexOpt solves an easier problem with no selection of centers, but even then Hybrid runs faster than ConvexOpt on average.

The time taken by MILP grows very fast as the number of centers increases whereas the time required by Hybrid and SwitchedDual is stable and small. More precisely, for fixed resource ratio, with |K| = 500, the times required by Hybrid, Switched-Dual, ConvexOpt and MILP are about 4, 4, 400 and 4700 (secs), respectively. To further demonstrate scalability, we increased the number of centers to 5000 and Hybrid and SwitchedDual finished in 70 seconds. In summary, the results show the superiority of our algoriTheorem

We evaluate the impact of the FSA constraints on the fairness of security resource distribution. To this end, we select |K| = 20 and also divide all the centers into L = 5 partitions of equal size. To better illustrate the fairness, we add 5 units to r a j , ∀j ∈ K 1 . This implies that the adversary will get more rewards if attacking a target in Partition #1. We also tighten the selection of parameters β l by choosing β l = 1.2m/L, for all l = 1, . . . , L.

The box plot in Figure 4 reports the distributions of the total resources assigned to the 5 partitions, with and without the FSA constraints. Without the FSA constraints, the first partition gets a high chance of being protected (allocating an expected number of more than two resources to the four centers in Partition #1), thus lowering the protection of other partitions. On the other hand, the FSA constraints maintains a fairness of security allocation between partitions.

In short, Fig. 4 clearly shows us the role of the FSA constraints in maintaining a fairness in security allocation.

We proposed a model for security of vaccine delivery in underdeveloped and security risk-prone areas and presented an efficient solver using a novel strong duality result with discrete variables. As such, a number of other schemes require security (setting up voting centers, medical camps, etc.). We believe our model can serve as a basis for formally modeling such problems and its variants. Our technical approach can also inform other techniques where a mix of discrete and continuous variables are used. 

A.1 Bilinear ILP and MILP For ease of notation, let g j (x j ) = N (x j )(w d j x j + l d j ). We divide the range of x j , i.e., [0,1], into K equal intervals and represent each x j = k∈[K−1] r jk , where r jk = 1/K if k ≤ Kx j and r jk = x j − Kx j /K if k = Kx j + 1 and r jk = 0 otherwise. We then approximate the univariate functions g j (x j ) and N (x j ) as

where N j (0) = e λr a j , and γ N jk , γ g jk are the slopes of N j (x j ), g j (x j ) in each small interval in [0, 1], defined as

To include r jk into the optimization model, we introduce binary variables z jk ∈ {0, 1} such that z jk ≥ z j,k+1 for k ∈ [K − 1]. We constrain 0 ≤ r jk ≤ 1/K for all k, and use r jk ≥ z jk /K to force r jk to 1/K if z jk = 1, and use r jk ≤ z j,k+1 /K to force r jk = 0 if z j,k+1 = 0. For the first k s.t. z jk = 0 (representing k = x j K + 1) the only applicable constraint is 0 ≤ r jk ≤ 1/K. Moreover, we also include binary variables θ j ∈ {0, 1}, j ∈ K to represent a subset S ⊂ K, i.e., θ j = 1 if j ∈ S and θ j = 0 otherwise. Combining all these, we can approximate (BOPT) by the following binary nonlinear program max θ,r,z K j∈K θ j (g j (0) − δ 0 N j (0)) (ApxOPT) + j∈K k∈ [K] γ g jk − δ 0 γ N jk θ j r jk subject to j∈K k∈ [K] θ j r jk ≤ Km (7)

z jk , θ j ∈ {0, 1}, ∀j ∈ K, k ∈ [K] (13)

The above program contains bi-linear terms θ j r jk , which can be linearized using the well-known "big-M" approach Wu (1997). However, this requires K|K| additional variables and 3|K|K additional constraints, making such approach not scalable at all. In the following result we show that (ApxOPT) can be formulated as a MILP with no additional variables and only |K| additional constraints. In particular, we claim that (ApxOPT) is equivalent to the following MILP max θ,r,z

θ j ≥ z j1 , ∀j ∈ K and constraints (9-13)

The above MILP is exactly the same one as shown in the statement of Theorem 2 in the main paper. We show the proof of the above claim in the proof of Theorem 2.

Using y j = e −λw a j xj , the complete transformed optimization is

where D is the convex set given by the Cartesian product × j∈S [e −λw a j , 1], N (y j ) = y j e λr a j and D(y S ) = j∈S N (y j ).

Briefly, Danskin's Theorem states that given f (x) = max z∈Z g(x, z) for compact set Z and g(x, z) convex in x for all z, and a unique maximizer z * of max z∈Z g(x, z) for any x, then we have

where z * is treated as a constant.

Recall that FixedDuals is max S,y S ∈D φ(S, ν, µ, y S ). By Danskin's theorem, given unique optimal S * , y * S (which we assume in this heuristic) and φ convex in ν, µ, we obtain the required derivative. an be obtained as

A.4 Functions used in Fig. 1 The two functions in Fig. 1a are f 1 (y) = y 2 and f 2 (y) = 0.5 + (y − 1) 2 . max 1,2 min x (f 1 (x), f 2 (x)) considers a min of each function separately and then chooses the max from among those. f 1 has min value 0 at x = 0 and f 2 has min value 0.5 at x = 1. Thus, the outer max chooses solution value 0.5 at x = 1. The solution of min x max 1,2 (f 1 (x), f 2 (x)) is at x = 0.75 with value 0.5625, which is a different solution from the max-min case. Also, note that at x = 0.75, we have f 1 (0.75) = f 2 (0.75) thus, the inner problem max 1,2 (f 1 (x), f 2 (x)) does not have a unique solution. This shows that uniqueness is needed for the minimax equality to hold. The two functions in Fig Proof. We first prove a claim that f is a strictly monotonic decreasing function of δ and hence f −1 is well-defined. This can be verified readily as D(x S ) > 0 for any feasible x S ; suppose optimal solution value is f (δ) for δ. For contradiction, assume f (δ ) ≥ f (δ) for some δ > δ with optimal solution S , x S for δ . Clearly, S , x S is feasible with δ also and achieves objective value j∈S N (

which is more than f (δ), a contradiction. Thus, we must have f (δ ) < f (δ). Hence, f −1 is well-defined. Also, the Lipschitz constants are derived at the end.

Next, as defined, f (δ) is the ideal value of obj in the algorithm if the BOPT solver has no error. Suppose S δ , x S,δ is the optimal for some δ and approximation provides S , x S with objective value within ξ of the objective value with S δ , x S,δ . The ξ approximation guarantee gives for any δ

Let δ * be a fixed value such that f (δ * ) = 0. We seek δ * in this algoriTheorem By definition, δ * = F(S * , x * S ), where S * , x * S is an optimal variable value for EqOPT. By Lipschitzness assumption in the theorem, for any δ (17) When the binary search stops then U −L ≤ . Suppose U and L was last updated with corresponding δ U and δ L (in line 5). Then, δ U − δ L ≤ . Also, suppose when the binary search stops the solution for the variables are S, x S . We hope to have δ * ∈ (δ L , δ U ) but cannot claim this for sure because of the ξ error.

We consider the case that δ * does not lie in (δ L , δ U ) and we bound how far δ * is from δ U , δ L . WLOG, we assume U was last updated before binary search stopped (the case of L last updated can be handled exactly analogously). Next, suppose the output of binary search is δ 0 , S, x S . We know that δ 0 = δ U . We know that objective BOPT(δ U , S, x S ) with δ U was negative (that is, obj was negative) because U was updated.

Suppose δ * > δ U , thus, due to decreasing f we obtain f (δ * ) < f (δ U ) meaning f (δ U ) is positive. Using Equation 16, we get

With the knowledge that BOPT(δ U , S, x S ) is negative and f (δ U ) is positive, we can readily infer that f (δ U ) ≤ ξ. Further, using Equation 17, f (δ U ) ≤ ξ and positive f (δ U ) gives |δ * − δ U | ≤ (1/d)ξ.

A similar reasoning applies for δ * < δ L . We know that objective BOPT(δ L , S , x S ) with δ L was positive (that is, obj was negative) when L was last updated. Note that we use S , x S = S, x S as we assumed U was the last update in binary search whereas S , x S is for when L was last updated. As δ * < δ L , thus, due to decreasing f we obtain f (δ * ) > f (δ L ) meaning f (δ L ) is negative. Using Equation 16, we get

With the knowledge that BOPT(δ L , S , x S ) is positive and f (δ L ) is negative, we can readily infer that f (δ L ) ≥ −ξ. Further, using Equation 17, f (δ L ) ≥ −ξ and negative f (δ L ) gives |δ * − δ L | ≤ (1/d)ξ.

Recall the output of binary search is δ 0 , S, x S . Combined with the possibility that δ * can lie in (δ L , δ U ) we get the worst case error of choosing output δ 0 = δ U (or δ 0 = δ L in the analogous case) is (1/d)ξ + . Thus, we showed above that |δ * − δ 0 | ≤ (1/d)ξ + .

Also, S, x S is the approximate solution of problem BOPT with δ 0 . Thus, |f

Combining these two facts we obtain:

Combining, using triangle inequality, with the proved result that |δ * − δ 0 | ≤ (1/d)ξ + and the fact that δ * = F(S * , x * S ) we get

Also, D(x S ) is smallest when all x's are 1 (as a worst case bound) giving D(x S ) > j∈S e l a j . With a worst case over S, we have D(x S ) > e min j∈K l a j for any S. With the lower bound constant L assumed, we get D(x S ) > e L . Thus,

The above is O(ξ + ) as d, D, L are constants. Details on Lipschitzness (intuitive and can be skipped by a time constrained reader but put here for completeness). Let S , x S be optimal solution for δ . We can easily get that for any δ > δ (WLOG)

, which using the fact proved above that D(x S ) > e L , we get |f (δ)−f (δ )| ≥ e L |δ −δ|, thus, d = e L (as δ, δ are arbitrary choices).

Similarly, if S, x S is optimal solution for δ then

. Now, it is easy to show that D(x S ) is largest when all x's are 0 (as a worst case bound) giving D(x S ) = j∈S e r a j . But, note that at max C centers can operate, hence with a worst case over S, we have D(x S ) < |C|e max j∈K r a j for any S. But as |C| and max j∈K r a j are bounded, say by C 0 and U 0 , this gives the Lipschitz constant D = C 0 e U0

Proof. To analyze (ApxOPTL), we request the reader to first read the Appendix A.1 to get familiar with the problem (ApxOPTL) introduced there. Let us denote f 1 (θ, r, z) and S 1 be the objective function and feasible set of (ApxOPT), and f 2 (θ, r, z) and S 2 be the objective and feasible set of (ApxOPTL). We see that S 2 ⊂ S 1 and f 1 (θ, r, z) = f 2 (θ, r, z) for all (θ, r, z) ∈ S 2 , thus max (θ,r,z)∈S 1 f 1 (θ, r, z) ≥ max (θ,r,z)∈S 2 f 2 (θ, r, z). Now let (θ * , r * , z * ) be an optimal solution to (ApxOPT), we create a solution (θ , r , z ) such that θ = θ * , z jk = r jk = 0, ∀k ∈ [K], if θ * j = 0, ∀j ∈ K. We can see that (θ , r , z ) is feasible to the both problems and f 1 (θ * , r * , z * ) = f 1 (θ , r , z ) = f 2 (θ , r , z ). From the fact that f 1 (θ * , r * , z * ) ≥ max (θ,r,z)∈S 2 f 2 (θ, r, z) ≥ f 2 (θ , r , z ), we have that any optimal solution to (ApxOPTL) is also optimal to (ApxOPT), as desired. Since (ApxOPT) approximates BOPT, then so does (ApxOPTL).

For the performance bound, let H = j w d j e λr a j (1 + λ max{|l d j |; |l d j + w d j |}) + δ 0 λw a j e λr a . Let N (x j ) and g j be the piece wise linear approximations of N (x j ) and g j (x j ). For any x j ∈ [0, 1], let k ∈ [K] such that x j ∈ [(k − 1)/K, k/K], it can be seen that there is α ∈ [0, 1] such that 

Similarly, we also have |g j (x j )−g j ((k − 1)/K)| ≤ 1/K max xj |g (x j )| ≤ ψ where ψ = 1 K w d j e λr a j + λw d j e λr a j max{|l d j |; |l d j + w d j |} . This also leads to

Now, let F (S, x S ) be the approximated objective function of B(S, x S ), i.e., F (S, x S ) = g j (x j ) − δ 0 j N (x j ). From the inequalities in (18) and (19) we have | F (S, x S ) − B(S, x S )| ≤ H/K. Now, let (S , x S ) be an optimal solution to the approximate problem and (S * , x * S ) be an optimal solution to (BOPT). We can write 

This can be done similarly for the second case to have |B(S * , x * S ) − F (S , x S )| ≤ H/K. Combine this with (20) we obtain |B(S * , x * S ) − B(S , x S )| ≤ 2H/K as desired.

Proof. First, writing the solution in expanded form:

for 1 ≤ β j e −λw a j βj , for 0 < β j < 1 1, for β j ≤ 0

Next, the objective is y j e λr a j −w d j log y j λw a j + l d j − δ 0 + (ν + µ l ) log y j λw a j While this objective is concave, we show that using B.5 Proof of Theorem 4

Proof. Recall Φ(S, ν, µ, δ 0 ) = max y S ∈D φ(S, ν, µ, y S , δ 0 ) (the objective function of SwitchedDualOPT). For BOPT, we want to solve DualOPT but instead we solve SwitchedDualOPT and that too approximately with additive error ξ. However, the first half of the proof of Theorem 1 does not depend on what BOPT is except for the result that f (which is SwitchedDualOPT here when solved exactly) is monotonic in δ.

In order to run the heuristic SwitchedDualOPT within Algorithm 1, it is necessary to have the objective value of SwitchedDualOPT be monotonically decreasing in δ 0 . To prove this, we first note that if u(y), u (y) are two functions from a compact set Y to R, then if u(y) ≥ u (y) for all y ∈ Y , then we have max y∈Y u(y) ≥ max y∈Y u (y) min y∈Y u(y) ≥ min y∈Y u (y).

To verify the above, let y * be an optimal solution to max y∈Y u (y). We then have max y∈Y u (y) = u (y * ) ≤ u(y * ) ≤ max y u(y). The second inequality can be verified similarly by letting y * * = argmin y u(y) and have min y∈Y u(y) = u(y * * ) ≥ u (y * * ) ≥ min y u (y).

We now come back to the objective function of SwitchedDualOPT. Given δ 1 ≥ δ 2 , for any S, ν, µ, y S we have φ(S, ν, µ, y S , δ 1 ) ≤ φ(S, ν, µ, y S , δ 2 ). The above remark tells us that which implies the monotonicity of the objective function of SwitchedDualOPT, as desired.

The Lipschitz constants can be derived in same manner as Theorem 1 and for notational simplicity we call them d, D here again. In this part, we choose small ξ such that (1/d)ξ ≤ . Thus, using the partial result from Theorem 1 that |δ 0 − δ * | ≤ (1/d)ξ + we get |δ 0 − δ * | ≤ 2 .

Recall that f (δ * ) = 0. Thus, after running Algorithm 1 with f = SwitchedDualOPT we will obtain δ 0 such that min ν,µ max S Φ(S, ν, µ, δ 0 + 2 ) ≤ 0.

Moreover, from the minimax inequality (which always holds) we have min ν,µ max S Φ(S, ν, µ, δ 0 + 2 ) ≥ max S min ν,µ Φ(S, ν, µ, δ 0 + 2 ) , thus max S min ν,µ Φ(S, ν, µ, δ 0 + 2 ) ≤ 0,

The above is DualOPT which has same optimal as BOPT. Then, if S * , x * S is the optimal solution for EqOPT, the above leads to the inequality F(S * , x * S ) ≤ δ 0 + 2 .

On the other hand, clearly F(S * , x * S ) ≥ F(S, x S ). This directly gives the upper and lower bound.

The inequality F(S * , x * S ) ≤ δ 0 + 2 .

also gives F(S * , x * S ) − F(S, x S ) ≤ δ 0 − F(S, x S ) + 2 ≤ |δ 0 − F(S, x S )| + 2 .

A near-optimal explorationexploitation approach for assortment selection

the original formulation in x's provides an easier path to a closed form solution. Transforming the variable back to x's using y j = e −λw a j xj for 0 ≤ x j ≤ 1 we get the objective asWith a little manipulation, the above can be written as ae −bxj x j − cx j + d for constants a > 0, b > 0, c > 0, d. This function can be readily checked to be unimodal (one maximum and monotonic drop on either side) though not concave. The first order condition for this iswhich gives, by definition of Lambert W function Corless et al. (1996) ,Then, using the definition of z, we solve for x j . Denotingthe solution gives x j = β. However, x j is constrained to lie in [0, 1] and using the fact that the objective is unimodal, we can state that if β is more than 1 then x j = 1 and if β is less than 0 then x j = 0. This gives the required result.

Proof. For part(1), all expressions in the algorithm can be evaluated exactly, except for the solution of the optimization in line 4. But, the optimization in line 4 can be computed by the closed form formula given in Lemma 1. The closed form requires computing the Lambert W function, which is possible with arbitrary constant precision in O(1) evaluations of the exponential function Johansson (2020). This gives us the desired result. For part(2), the optimization in line 4 can be computed by O(1) evaluations of the exponential function Johansson (2020), which is overall O(1) as exponentiation is assumed constant time. All loops run for a maximum of |K| iterations, and the operations in them are all constant time. Other than the loops the step are sorting H (of length |K|) and partitioning it. The runtime is dominated by the sorting of |K| sized array, which gives O(|K| log |K|).