key: cord-0656709-pfjwint4 authors: Luo, Fengqiao title: A Distributionally-Robust Service Center Location Problem with Decision Dependent Demand Induced from a Maximum Attraction Principle date: 2020-11-25 journal: nan DOI: nan sha: d67e582e0867412cce6bfba5251eb01e272fc4b0 doc_id: 656709 cord_uid: pfjwint4 We establish and analyze a service center location model with a simple but novel decision-dependent demand induced from a maximum attraction principle. The model formulations are investigated in the distributionally-robust optimization framework. A statistical model that is based on the maximum attraction principle for estimating customer demand and utility gain from service is established and analyzed. The numerical experiments show that the model admits high computational efficiency in solving mid-size instances. The facility location problem is one of the most fundamental problems investigated in operations research. In this problem, a decision maker needs to decide locations of a limited number of facilities (factories, retail centers, power plants, service centers, etc.), and determine coverage of demand from different sites by the located facilities. The objective is to minimize the facility setup cost and the cost of production and delivery. This problem provides a basic framework to formulate related problems in resource allocation, supply chain management and logistics, etc. The facility location models in the stochastic and robust (distributionally-robust) optimization framework have also received sufficient investigation. In the stochastic optimization framework, the customer demand can be random parameters following a known or partially known probability distributions, while in the robust (distributionally-robust) optimization framework customer demand (the probability distribution of customer demand) can have a certain level of uncertainty (ambiguity). Incorporating randomness or uncertainty in customer demand estimation is a sensible and realistic model improvement, as the decision maker has no perfect prediction on the demand in practice. Recently, there is a trend of research on distributionally-robust optimization problems with decision-dependent ambiguity, in which the ambiguity set is specified by decision dependent parameters. Introducing decision dependency has certain merit in situations when model parameters are naturally functions center in order to receive service. This problem is motivated from clinic or medical test center (such as for COVID-19 screen test) allocation. The objective is to maximize the total utility gain of all customers who have decided to receive service from the opened service centers. The decision dependence is decoupled from the ambiguity set in our model, which is different from (Basciftci et al. 2020) . Specifically, we impose the decision dependency on the demand itself but not on the uncertainty of demand, and the decision-dependent demand is induced from a maximum attraction principle based on a ranking of opened service centers in the neighborhood. Our investigation shows that the decoupling approach leads to high computational efficiency of solving the problem and it can be more data-driven in practice. Furthermore, the novel approach of modeling decision dependency introduce additional combinatorial properties to the model which is of independent interest. The contribution of this paper is summarized as follows: 1. A novel approach of modeling decision dependency of the customer demand has been established for the service center location problem, with a possible extension on the modeling of decision dependency. In this approach, the decision dependency is decoupled from the ambiguity set which admits highly computational tractable reformulations. 2. A learning model has been established to estimate utilities and demand based on data from survey. 3. The numerical experience with this model shows that mid-and large-size instances can be solved very efficiently due to decoupling of decision dependency from the ambiguity set. Facility location problems with non-deterministic demand have received plentiful investigation in the framework of stochastic optimization and robust (distributionally-robust) optimization. In the stochastic optimization framework, customer demand are independent random parameters with known probability distributions and the problem can be formulated as a two-stage stochastic program in which the location vector is in the first-stage decision that needs to be made before realization of customer demand (Louveaux and Peeters 1992, Albareda-Sambola et al. 2011) . In a special case when the customer demand rates and service rates of each facility are assumed to follow exponential probability distributions, the problem of minimizing long term average cost can be reformulated as a deterministic optimization problem with corresponding Poisson rates to characterize the demand and service. This setting has been applied to an allocation problem of ATMs (Wang et al. 2002) . The probability distribution of demand can also be used to define chance constraints to ensure a required service level under possible stockout and supply disruption (Murali et al. 2012 , Gülpınar et al. 2013 , Lim et al. 2013 , Li et al. 2017 ). In the robust (distributionally-robust) optimization framework for facility location problems, the information of demand is partially known to the decision maker, and the goal is to find a robust optimal location vector that optimizes the objective after a worst-case realization of demand information (Snyder 2006) . Following this direction of research, Baron and Milner 2010 investigated a robust multi-period facility location problem with box and ellipsoid sets of uncertainty. Gourtani et al. 2020 investigated a distributionally-robust two-stage facility location problem with an ambiguity set defined corresponding to the mean and covariance matrix of a random parameter for expressing the demand. Facility location problems with demand uncertainty have been studied with a variety of novel application background, which includes but not limited to medication coverage and delivery under a large-scale bio-terror attack (Murali et al. 2012) , medical equipment (defibrillators) location problem to reduce cardiopulmonary resuscitation (CPR) risk (Chan et al. 2017) , humanitarian relief logistics (Döyen et al. 2012) , and hazardous waste transportation (Berglund and Kwon 2014) , etc. In a lot of real world problems, the uncertainty of parameters can interplay with the decision to be made. This behavior is well observed especially in a sequential (multi-stage) decision-making process, in which information about system parameters are gradually revealed and the decisions made up until the current stage can reshape the uncertainty in future . It motivates the research on multi-stage stochastic optimization with decision-dependent uncertainty. Solution strategies based on Lagrangian duality and novel branch-and-bound methods are developed to solve this family of problems (Goel and Grossmann 2005 , Gupta and Grossmann 2011 , Tarhan et al. 2013 , and it has a rich application in oil-chemical industrial (Goel and Grossmann 2004 , Tarhan and Grossmann 2008 , Tarhan et al. 2009 ). An approximation scheme (Vayanos et al. 2011 ) is proposed to tackle the high complexity in solving the multi-stage problems with decision-dependent information discovery. Decision-dependent uncertainty has also been considered in stochastic optimization problems such as resource management (Tsur and Zemel 2004) , stochastic traffic assignment modeling (Shao et al. 2006) , and robust network design (Ahmed 2000 , Viswanath et al. 2004 ). Imposing decision-dependent uncertainty for robust (distributionally-robust) optimization has received great attention in recently years Sharma 2018, Luo and INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS Noyan et al. 2017 , Basciftci et al. 2020 , Luo and Mehrotra 2019a . Nohadani and Sharma 2018 studied robust linear programs with decision dependent budget-type uncertainty and its generalization with a polyhedral uncertainty set. This concept is demonstrated in a robust shortest-path problem, where the uncertainty is resolved progressively when approaching the destination. Noyan et al. 2017 investigated a family of distributionally-robust optimization problems with an ambiguity set defined using earth mover's distances (including total variation distance and the Wasserstein metric) with decision-dependent parameters such as the nominal probability distribution and the radius, and focused on understanding which settings can lead to tractable formulation. Royset and Wets 2017 provided a variational principle analysis for optimization under stochastic ambiguity, which gives fruitful tools for analyzing solution quality and price of robustness. Some novel applications are studied in (Spacey et al. 2012) for robust software partition, and in (Nohadani and Roy 2017) for radiation therapy design. The work in (Basciftci et al. 2020) and (Luo and Mehrotra 2019b) are mostly related to the work of this paper. Basciftci et al. 2020 investigated a distributionally-robust facility location problem with decision-dependent ambiguity, where the ambiguity set is defined using disjointed lower and upper bounds on the mean and variance of the candidate probability distributions for customer demand, and these bounds are defined as linear functions of the location vector to admit a tractable MILP reformulation. The model considered in (Luo and Mehrotra 2019b ) assumed deterministic customer demand and imposed a linear decision dependency on the moment bounds for candidate probability distributions of utility, which leads to a MISOCP reformulation. We consider a problem of allocating a set of service centers (facilities) in a region to meet the customer demand from the region. Similarly to the traditional facility location problem, the region consists of customer sites and candidate locations for the service centers. Each customer site has certain demand that needs to be fulfilled by a service center from its neighborhood. Especially, we consider the case that the demand of each customer site is a random parameter which depends on the locations of service centers. In most literature on decision-dependent robust (distributionallyrobust) optimization, the decision-dependency is formulated as a linear function. In this problem, we consider a special form of decision dependency that is based on an assumption of maximum attraction principle, which will be clear in Section 2.1. We first formulate a deterministic model of this problem, in which we treat the demands as deterministic parameters that depend on the locations of service centers and introduce the notion of maximum attraction principle. The notations used in this model is given by the following list: S the set of customer sites; F the set of candidate locations for service centers; b j the cost of opening a service center at location j ∈ F; B the budget of allocating service centers; C j the service capacity of a service center located at j ∈ F; u ij the utility gain obtained by a customer from site i ∈ S who gets service from the service center at j ∈ F; y j the binary decision variable of opening a service center at location j ∈ F; D i (y) the demand from customer site i that depends on the decision vector y; x ij the demand flow from customer site i ∈ S to the service center at j ∈ F. The deterministic model is formulated as follows: (1) The first three constraints represent the limit of budget, capacity and the amount of demand, respectively. Note that in general the demand from site i ∈ S can depend on the pattern of service center locations which is represented by the decision vector y. In this paper, we investigate a specific type of location dependent demand which is simple but sensible in depicting customers' behavior in practice. This type of location dependency of demand is referred as the maximum attraction principle in this paper. We now give a detailed description of this principle. First, each customer site i is associated with a preferable subset F i of candidate facility locations. When there is only one service center opened at j ∈ F i , it will attract D ij amount of demand from i. If there are multiple service centers opened at locations in F i , the maximum demand that can be attracted from i is equal to the maximum D ij for j ∈ F i that has a facility. This principle is formally described by the following equation: where e j is the |F i |-dimensional vector with the j-th entry being 1 and other entries being 0, and the set F i (y) is defined as F i (y) := {j ∈ F i | y j = 1}. The maximum attraction principle matches with our intuition from practice. This principle first assumes that locations of service centers that are not within the preferable location set F i are not attractive to customers from site i at all. The most natural way of establishing F i is based on the distance from site i to the candidate locations. This is a reasonable assumption because customers usually will not consider visiting a service center that is beyond a certain distance from their living place. The maximum attraction principle further assumes that within the preferable opened service centers F i (y), there exists one service center j * (or multiple centers) that is (are) most attractive to the customers from site i in the sense that j * ∈ argmax j∈F i (y) D ij , and the presence of multiple service centers in F i (y) including j * attracts the same amount of demand from site i as the presence of just a single service center at j * . This assumption is also consistent with our intuition from practice. If every service center is identical in the sense of scale and service quality, the most attractive one to site i is likely the one that is most close to i. Furthermore, customers who are willing to visit further service centers are also willing to visit the one that is most close to their living place. Note that the maximum attraction principle can be extended to incorporate the impact of the number of opened service centers in F i on the demand. This extension is discussed in Section 2.2. We also define the subset S j as S j := {i ∈ S | j ∈ F i } for all j ∈ F. Based on the maximum attraction principle, the service center location problem (1) is written as the follows: To go one step further, we can linearize the term max j∈F i (y) D ij by introducing some continuous auxiliary variables q ij to form it as a convex combination of D ij for j ∈ F i . After this transformation, we obtain the following equivalent formulation of (3): Notice that in the above reformulation, the auxiliary variables {q ij | j ∈ F i } are used select the most attractive candidate location driven by the sense of maximizing the objective. The constraint q ij ≤ y j ensures that only opened service centers in F i are involved in the maximum attraction principle. Since j∈F i q ij ≤ 1, the model will set q ij * = 1 and q ij = 0 for all j ∈ F i \ {j * } to relax the constraint j∈F i x ij ≤ j∈F i D ij q ij as possible. In the model setting, we implicitly assume that the customers are willing to corporate with the decision maker to maximize the total utility gain. Although this assumption is highly impractical, the rationality of this model depends on what metric we use to measure the system performance. If the goal is to estimate what is expectation of total utility in practice, then the following questions should be addressed and the corresponding aspects should be properly modeled: What is the service policy used by each service center (FIFO or some other policies)? How to characterize customers' behavior and the mechanism of competing for the limited service capacity? If their most favored service center does not have any capacity, are they willing to accept the service from the less preferred locations with less utility gain? Incorporating all these factors into the model can easily make it very complicated, and meanwhile a large amount of customers' information are needed to drive this approach of modeling. On the other side, if the goal is to access what is the maximum potential utility that can be achieved by the system in the most ideal situation, then the (DDSL) model can be used to give an estimation. A possible modification of modeling the decision-dependent demand is to add a perturbation term to the demand based on the number of opened service centers. Specifically, we can modify the constraint i∈F i where a i is a parameter that measures the influence of opening one more service center on the demand. In this way of modeling, it is assumed that the marginal increase of demand may depend on the number of opened service centers. In reality, when potential customers see more chain stores are opened in the neighborhood, they may have higher intention to try one of them. In this case, the parameter a i can be a random parameter with certain level of ambiguity in the distributionally-robust extension of the model. The deterministic service center location model with decision dependent demand can be further extended to a distribtuionally-robust two-stage stochastic program after imposing an ambiguity set on the pairwise demand D ij . This extension is motivated by the fact that estimation of the demand parameters D ij could be inaccurate. Therefore, we can treat D ij as random parameters with an unknown joint probability distribution, and apply the distributionally-robust framework on this service center location problem with uncertainty. We assume a finite support of the joint INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS probability distribution of the pairwise random demand vector D := {D ij | i ∈ S, j ∈ F i }. Let the finite support be based on |Ω| samples written as D ω = D ω ij i ∈ S, j ∈ F i for all ω ∈ Ω. In this case, any probability distribution of D can be represented as a |Ω|-dimensional vector. We define a nominal probability distribution µ 0 of D as In the vector representation, we write it as µ 0 = [µ ω 0 : ω ∈ Ω]. The ambiguity set of candidate joint probability distribution of D is defined based on the total variation distance between two probability distributions. Specifically we consider an ambiguity set of the following form: Based on the above definition of ambiguity set, the distributionally-robust two-stage stochastic extension of the service center location model can be formulated as follows: where the recourse function Q(y, D ω ) for scenario ω is given by x Using a standard technique for two-stage stochastic programming, we can decompose (DRO-FL) into a master problem and scenario sub-problems. The original problem (DRO-FL) can be solved iteratively. In each iteration, we solve the current master problem in the space of y and pass the current master solution to each scenario sub-problem. After solving each scenario sub-problem for the fixed first-stage solution, we can generate a valid inequality for each scenario using the optimal dual values associated with constraints of the scenario sub-problem. Then we aggregate the valid inequalities from all scenario sub-problems using the worst-case measure of scenarios to get a single cut for the master problem which is an optimality cut. The optimality cut is added to the master problem in the next iteration. Specifically, the master problem at iteration n can be represented as max η where µ (k) is the iteration based worst-case probability measure on scenarios at iteration k ∈ [n − 1]. The way of determining this worst-case probability measure is given in (10). At iteration n we solve the master problem (Master) and obtain the current optimal first-stage solution y (n) . This y (n) is input into every second-stage scenario sub-problem. The second-stage linear program is ω,ij ≥ 0 for all i ∈ S, j ∈ F i are optimal dual values corresponding to the constraints (7b), (7c), (7d) and (7e) The strong duality implies that when evaluating Q(y, D ω ) at y (n) , we get The worst-case probability measure µ (n) is an optimal solution of the following linear program: where Q(y (n) , D ω ) is the value function evaluated at the first stage solution y (n) and scenario ω. Once the current worst-case probability measure is obtained, the following inequality will be added to the first-stage master problem: The algorithm and convergence property for solving (DRO-FL) are given in Appendix 5. It can be shown that for a given first-stage solution y, the scenario sub-problem can be solved using a greedy algorithm. INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS We consider a special case of (DRO-FL) in which each candidate service center has sufficient capacity for service. In this case, the capacity constraints (7b) can be removed from the secondstage scenario problems. We will show that this leads to a simplified scenario problem that admits a closed form optimal solution and optimal objective for the second-stage problem with a mild regularity condition on the parameters. Then after dualizing the inner minimization problem over the probability measure on scenarios, (DRO-FL) can be reformulated as a mixed 0-1 linear program. The single-stage reformulation result is given by the following theorem. Definition 1. The utility gain and demand are consistent if for every ω ∈ Ω, i ∈ S and F ⊆ F i there exists a j ∈ F such that u ij = max j ∈F u ij and D ω ij = max j ∈F D ω ij . The consistency condition for utility gain and demand says that the location that attracts the most demand over other locations should also correspond to the highest utility. Theorem 1. Suppose the utility gain and demand are consistent. In the uncapacitated case, the distributionally-robust service center location problem (DRO-FL) with decision dependent demand based on the maximum attraction principle and the ambiguity set (6) can be reformulated as the following mixed 0-1 linear program: Proof. Without capacity constraints, it is easy to see that the optimal objective value of the scenario problem is given by where we use the assumption that the utility gain and demand are consistent. The problem (DRO-FL) becomes the following: The terms involved in the inner minimization problem of (14) can be written as Taking the dual of the above linear program with respect to the probability measure µ, we obtain the following inner problem: To linearize the term max j∈F i u ij D ω ij y j , we can introduce binary indicator variables s ij , and reformulate the first constraint as follows where we implicitly use the consistency condition of utility and demand which implies argmax j∈F i u ij D ω ij y j is scenario independent and hence the variable s ij . Incorporate with the outer maximization yields the reformulation (12). INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS We establish a regression model based on the maximum attraction principle for estimating utility value u ij and demand D ij that are input parameters to the service center location model (DRO-FL). As a by-product, this regression model can also be used to estimate the size parameter d of the ambiguity set (6). The regression model requires samples of survey among potential customers from a site i ∈ S on their ratings of multiple candidate service center locations. Before formulating the regression model, we first introduce how the samples of survey are collected for fitting the regression model. Suppose N residents have been randomly selected from the site i, and they are viewed as potential customers of service centers under planning. Each of them is asked to give a score in the range {0, 1, . . . , q} to each candidate service center location in F i . The score that a customer is assigned to a location j ∈ F i is taken as the potential utility gained by the customer if going to the service center at j. Score value 0 means the customer is unwilling to get service from the corresponding location. Since the regression model structure is identical for each i ∈ S, we omit the customer site index i and re-write F i as G in the following modeling and analysis. Suppose the indices of location in G are labeled as G := {1, 2, . . . , g}, the N residents are labeled as {1, 2, . . . , N }, and the score assigned to a candidate location j ∈ G by the resident k is denoted as a kj . The maximum attraction principle described in (2) may not be satisfied in reality. But it is possible to establish a utility-demand estimation model that approximately meets the maximum attraction principle. For example, we can verify whether the samples from survey satisfy the maximum attraction principle by grouping the customers who have taken the survey as follows: where [N ] := {1, . . . , N }. In words, V j is the set of residents who are willing to go to a service center located at j. According to (2), the samples satisfy the maximum attraction principle exactly if there exists a permutation σ on the indices in G such that the following inclusive condition holds: A logic behind the maximum attraction principle is that if two candidate locations have similar features, then if a customer is attracted by one location, the customer should also be attracted by the other location with high chance. In this case, the difference in the attractability of the two candidate locations viewed by the customer is reflected in the score assigned to the two locations by the customer. The scores in this case are both non-zero indicating that the customer is willing to visit any of them. Only in the case that two candidate locations have some substantial differences (i.e., one is too far away from the customer site or one is located at a bad community), a customer will be willing to visit one location (assigning a non-zero score to it) while unwilling to visit the other one (assigning a zero score to it). We establish an inclusive chain model to represent the set-level realization of the maximum attraction principle. First we build the subsets (16) using collected samples, and the we sort the indices in where σ is a permutation on G that makes this condition hold. For clarity we assume that |V 1 | ≤ |V 2 | ≤ . . . ≤ |V g | without loss of generality. The inclusive chain model assumes that the score assigned to the candidate locations in G by a customer from a fixed location should match with one of the following patterns: where in the i-th pattern, first i − 1 scores are all zero and the remaining g − i + 1 scores are all non-zero. By convention, the (g + 1)-th pattern is just [0, . . . , 0]. In a probabilistic flavor, the inclusive chain model (ICM) for the score vector can be formally established as a statistical model presented as where ξ is the random score vector, B is an indicator random variable that selects the pattern matching with the score vector, p j is the probability of score vector matching with the j-th pattern, and π (i) jr r = 1, . . . , q gives the probability distribution of the random score Q Proof. We prove it contradiction. Suppose there exist subsets V i and V j (with i < j) such that V i \ V j = ∅ with some positive probability. Suppose k is the customer who is willing to visit V i but not V j , and let ξ k be the score vector of this customer. The definition of V i implies that ξ i ≥ 1 but ξ k / ∈ V j implies ξ k j = 0. On the other side, for every pattern vector ζ we should have 1{ζ i > 0} ≤ 1{ζ j > 0} almost surely, which implies that 1{ξ k i > 0} ≤ 1{ξ k j > 0} almost surely. But this contradicts to ξ k i ≥ 1 and ξ k j = 0. INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS We consider the problem of fitting the inclusive chain model with the collected samples (score vectors). The first step is to determine the rank (level of attracability) of candidate locations G involved in the inclusive chain. An empirical method by simply sorting the cardinality of V i can be used to achieve this. The probability guarantee of this method which will be discussed in a moment. For now, assume that the order has been identified, and suppose |V 1 | ≤ |V 2 | ≤ . . . |V g | without loss of generality. So empirically, this implies that the rank is 1, 2, . . . , g sorted by level of attractability from low to high. Based on this information, we can establish an inclusive chain model as (ICM), but this model is not capable to handle score vectors in which there is at least one zero-value entry between two non-zero value entries, i.e., [0, . . . , 0, 5, 7, 0, 0, 7, 6, 9] . We call a score vector defective if it can not match with any pattern vector in (18) valid for the inclusive chain model. Note that to convert the defective score vector [0, . . . , 0, 5, 7, 0, 0, 7, 6, 9] into a qualified pattern one can either add two 1 score entries to the middle ([0, . . . , 0, 5, 7, 0, 0, 7, 6, 9] → [0, . . . , 0, 5, 7, 1, 1, 7, 6, 9]) or remove the first two nonzero entries ([0, . . . , 0, 5, 7, 0, 0, 7, 6, 9]→[0, . . . , 0, 7, 6, 9]). The first approach has added a total 2 units of score while the second one has removed a total 12 units of score, and it can be verified that the first approach is the one that makes the minimum total score change among all possible changes that can convert this score vector into a qualified pattern. Based on this rule, we can define the defective score units for an arbitrary score vector as follows. Definition 2. For a given arbitrary score vector v, the defective score quantity is given as w is a qualified pattern of the inclusive chain model. Observation 1. The defective score quantity has an upper bound g − 1. Let H s be the event that the defective score quantity of score vector is s, and m s is the probability of this event. After introducing these additional parameters, the inclusive chain model can then be adjusted to incorporate all defective score vectors as follows (AICM) The above model is referred as the adjusted inclusive chain model (AICM). 3.2.1. A theoretical probability guarantee of identifying the order of the inclusive chain model As mentioned at the beginning of Section 3.2, a simple sorting method can be used to identify the rank of the candidate locations in G for the AICM. We sort the cardinality of subsets where a k is the score vector of the k-th customer participated in the survey, and a kj is its j-th element. The following proposition provides a probability guarantee on identifying the order of inclusive chain model given that samples obey the AICM. Proposition 2. Suppose the set of N samples collected from survey follow the AICM, Let the set of candidate locations be G = {1, . . . , g}. Let E be the event that the rank of inclusive chain model can be successfully identified by sorting the cardinality of V j (j ∈ G). Let p * = min j∈G p j . If Proof. We first show that with the given assumptions, the rank of the inclusive chain model can be identified almost surely as the total number of samples N goes to infinity. In the analysis, we let σ be the permutation that gives the rank of locations, i.e., σ(1) < . . . < σ(g) is the rank of locations from low to high. Let N j be the number of samples that match with the j-th pattern Based on the probability setting of AICM, and the law of large numbers we should have the following asymptotic relation hold: This implies that N j > 0 almost surely as N → ∞. By the definition of V j , we should have the following bounds for every |V j | For any i, j satisfying σ(i) < σ(j), it suffices to show that |V σ(i) | < |V σ(j) | as N → ∞. Indeed we INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS and σ(j) k=σ(i)+1 mp k − (1 − m 0 ) ≥ mp * − (1 − m 0 ) > 0 by assumption. Therefore the rank of the inclusive chain can be identified almost surely as N goes to infinity. Now we prove the nonasymptotic result. Note the probability P (E) can be lower bounded as follows Hoeffding's inequality gives the following bound Then it follows that P To avoid redundant notations in the following analysis, we assume that candidate locations are ranked as 1 < . . . < g based on attractability. To fit the AICM with samples, a likelihood function and some regularity terms need to be established. Note that without consideration of any regularity, all model parameters can be estimated naturally using a frequency count approach based on the given samples. For example, an empirical estimation of p j can bep j = N j /N , and an empirical estimation of π (i) jr can beπ jr is the number of samples that have Q (i) j = r. However, this estimation can lead to overfitting of new samples. We now focus on establishing a likelihood function and introduce some regularity terms to prevent overfitting. First, the parameter m s can be estimated as m s = #{k∈[N ]: ξ k ∈Hs} N for all s ∈ {1, . . . , g − 1}. All defective samples can be further used to estimate other parameters with more information. If a sample ξ k is defective, it can be converted into a qualified score vector ξ k by imposing a defective score quantity (Definition 2) to ξ k , and hence the original samples {ξ k } N k=1 can then be converted into a set of qualified score vectors {ξ k } N k=1 . We can evaluate the probability of generating ξ k by the model using the model parameters, and denote this probability as P (ξ k |p, π). Then the (logarithmic) likelihood function of generating {ξ k } N k=1 can then be formulated as Note that maximizing the above likelihood function with only probability-normalization constraints leads to a frequency estimation of all parameters. Some regularity terms can be added to balance this trend. First, for score vectors in the same pattern, parameters that lead to smaller variance of score on a given candidate location are preferred, and hence the following regularity terms can be added (with some regularity coefficients): ∀i ∈ {1, . . . , g}, ∀j ∈ {i, . . . , g}. Second, consider any two consecutive locations j and j + 1 in the inclusive chain. The two locations are both included in patterns H 1 , . . . , H j . In the pattern H i (1 ≤ i ≤ j), the mean score difference between the two locations is given by For every pattern that contains locations j and j + 1, we expect the above mean score difference to be similar. To enforce this similarity, we can add the following regularity terms (with some regularity coefficients): After incorporating the above regularity terms, the parameters can be estimated by solving the following constrained minimization problem of the loss function. min L(p, π) := 2λ 1 g(g + 1) where λ 1 and λ 2 are regularity coefficients that can be tuned using a cross-validation approach. Note that we add a negative sign to the logarithmic likelihood function when it is incorporated into the loss function L(p, π). Notice that (24) is a nonconvex optimization problem with continuous parameters and simple constraints, and it can be solved to local optimality very efficiently using numerical optimization solvers. INFORMS Journal on Optimization 00(0), pp. 000-000, © 0000 INFORMS Once the regression problem (24) is solved to give an estimation of the AICM parameters, the AICM can be used to estimate the (mean) utility value, guide the generation of demand samples. First, the utility value u ·,j (the customer site index is ignored since the AICM is established for any given customer site) can be estimated as follows: wherep andπ are the estimation of AICM parameters. Generating a number of joint demand samples and use them to construct the ambiguity set can be a challenging problem in practice. There are two major obstacles in achieving this. (a) Scaling factor: since only a limited amount of survey samples are collected, how to scale the demand estimation based on the samples to roughly estimate the total demand from a customer site? (b) Demand catenation: how to catenate the estimated demand samples for each customer site to get joint demand vectors (for all customer sites) as samples that can be directly input to the ambiguity set? We try to give a few guidelines for dealing with these problems in this paper. For (a), the total potential demand from a customer site could be estimated via a different channel which is not the focus of this paper. Some investigation on the demand estimation and especially the healthcare service demand estimation is conducted in (Bajari and Benkard 2005 , Cote and Stephen 2001 , Doi et al. 2017 , Griffin et al. 2008 . The survey of scoring on candidate locations are only used to quantify the attractability of each candidate location that can lead to different estimated demand for different candidate locations when the total potential demand is given. In the next paragraph for addressing (b), we provide a estimation of demand associated with each candidate location based on assuming the total potential demand is N (the number of collected samples). In the case that we have a separate estimation A (can be a random variable) of total potential demand the scaling factor can be defined as A N , and it can be used to scale up estimated demand associated with different candidate locations. In the following discussion for (b), we assume that the demand samples generated (using Algorithm 1-2) will be multiplied by the scaling factor (could be customer site dependent), and omit this factor in these algorithms. This way of constructing samples (scenarios) has been used in . Now let us focus on how to generate samples for each D i . The pseudo code for generating a sample for D i is given in Algorithm 1. Note that by repeatedly call this algorithm, one can generate any given number of samples for D i . A pseudo-code for generating a sample for D is given as Algorithm 2. Algorithm 1 An algorithm for generating a mixed-defective sample for Di (i ∈ S) following AICM. 1: Input data: score vector samples Yi = { ξ k } N k=1 that have been collected from the survey at the customer site i. 2: Determine the rank of ICM based on the score vectors in Yi. Sort the indices in Fi (from low to high) based on this rank. 3: Based on this rank, partition Yi into qualified subset Y q i of samples and defective subset Y d i of samples. Let . 4: Solve (24) to get an optimal estimation of probability parameters denoted as p * , π * and substitute p * , π * into (ICM). 5: Generate N i.i.d. score vectors denoted as {ξ k } N k=1 such that each score vector is generated using the way given as follows: with probability ρ sample ξ k following the fitted (ICM), and with probability 1 − ρ sample ξ k uniformly from Y d i . 6: Let Dij = card{k | ξ k j ≥ 1} for all j ∈ Fi. 7: Return the sample Di = [ Dij : j ∈ Fi]. Algorithm 2 An algorithm for generating a mixed-defective sample for D following AICM. 1: Input data: (1) same input as Algorithm 1 for all i ∈ S. (2) sample size N i for every i ∈ S and N . 2: Generate N i i.i.d. mixed-defective samples for every i ∈ S using Algorithm 1. The numerical investigation consists of two major parts: the numerical analysis of the AICM and the computational performance of the (DRO-FL) model. In Section 4.1, we provide numerical analysis of the AICM, where we simulate certain amount of score vector samples and use them to fit the AICM. In Section 4.2, we show the computational performance of solving generated instances of the (DRO-FL) model, and provide sensitivity analysis of the robust optimal locations with respect to the sample size and the choice of radius in the ambiguity set. For the numerical analysis of AICM, we focus on a specific customer site with 4 candidate service center locations labeled as L 1 ∼ L 4 (the rank of attractability is unknown). We consider 6 possible scores {0, 1, . . . , 5}. We simulate N i.i.d. score vectors with some defected ones and use them to fit the AICM. The generation of each score vector is based on the following procedures. First, we simulate a 4-dimensional real vector v following the Gaussian distribution N (w, Σ) with w = [1, 3, 4.5, 2] corresponding to the mean score of L 1 ∼ L 4 respectively, and Σ = diag[0.8 2 , 1.5 2 , 1, 1.5 2 ] as the covariance matrix. Next, each entry of v is rounded to the closest integer from {0, 1, . . . , 5}. Finally, with probability 0.8 we accept this score vector, while with probability 0.2 we randomly select an entry of v, set it to be zero and accept the resulting score vector. We follow the steps instructed in Section 3.2 to fit the AICM. An essential step is to solve the nonlinear optimization problem (24). This problem is implemented in Julia 1.5.2 (Bezanson et al. 2017) with optimization package JuMP dev (Dunning et al. 2017 ) as the modeling interface and Ipopt 3.13.2 (Wächter and Biegler 2006) as the solver. For simplicity, we let λ 1 = λ 2 = λ in (24). We use a 4-fold cross validation approach to select an appropriate regularization parameter λ. The parameter selection consists of two rounds. The first round is a rough selection, where the candidate λ (1) is selected from {0, 0.1, 0.2,. . . , 5} based on the out-of-sample performance. Suppose λ (1) * is the selected value after the first round. In the second round, we select an optimal λ * from the set λ (1) * ± 0.01k k = 0, 1, . . . , 10 based on the out-of-sample performance. The out-of-sample performance is measured by the averaged logarithm of likelihood. The first-round selection results are given in Figure 1 for three different sample sizes N = 50, 100, 200. The second-round selection results are given in Table 2 , from which we see that the optimal λ is 2.98, 2.98 and 0.37 for N = 50, 100, 200, respectively. The curves in Figure 1 indicate that for a small sample size (N = 50) a larger regularization parameter lead to a better out-of-sample performance, while for a larger sample size (N = 200) the best out-of-sample performance is achieved at a smaller regularization parameter. This behavior is intuitive since training the model with a small sample size can lead to over fitting if the model parameters are not sufficiently regularized. When more samples are available to train the model, the regularization can be less restrictive. First round of λ selection. Finally, we fit the AICM using N samples and the optimal λ parameter. The probability parameters p and π from AICM can be used to estimate the probability distribution on scores for each candidate location. The results are given in Figure 2 for the 4 candidate locations. The probability distribution on score values reveal that the rank of attractability from high to low is: Location 3 > Location 2 > Location 4 > Location 1, which matches with the underline stochastic model that generates the scores. The mean scores corresponding to the four locations are given in Table 1 . We investigate the computational performance of solving numerical instances of the (DRO-FL) model. To generate numerical instances, we create a 100 × 100 two-dimensional square as the map. The customer sties are points inside the square, and the coordinates of each customer site are uniformly drawn from the range [0, 100] 2 . We let F = S, which means each customer site can be a candidate location of a service center. We sort the set of distances W = {d(i, j) | ∀i ∈ S, ∀j ∈ S, i < j} from small to large, and let d c be the 5% quantile of the sorted list. For every customer site i, we select all customer sites (including i) that are within d c distance from i to form the set F i . We let the utility completely depend on the distance. Specifically, we set u ii = 5 for all i ∈ S. We also let the first element in W correspond to utility 5, the last element in W correspond to utility 0.5, and any other elements correspond to utility values obtained using a linear interpolation. For a sample ω, the demand D ω ij is drawn from the normal distribution N (120u ij , (12u ij ) 2 ). The capacity of each candidate service center is set to be 1000. We assume the cost of opening each service center (a) Probability distribution on scores of location 1. (b) Probability distribution on scores of location 2. (c) Probability distribution on scores of location 3. (d) Probability distribution on scores of location 4. Probability distribution on scores for each location at different sample sizes. is identical (an unit cost), and the budget B in this case is equivalent to the maximum number of service centers that can be opened. We have generated 18 instances of (DRO-FL), and each instance is characterized by the number |S| of customer sites (also candidate locations) and the budget B (number of maximum service centers that can be allocated). For each instance, we have tried three different sample-size options (N = 100, 200, 500), and the total variation distance is set to be 0.2 in these calculation. The (DRO-FL) model is implemented and solved using a Gurobi Python interface (Python 3.7 and Gurobi 9.0). For every instance, the master problem and all scenario sub-problems are solved using a single 2.50GHz CPU. The computational results for the 12 instances are shown in Table 3 . It is shown that all instances have been solved to optimality within 15 master iterations (see Algorithm 3). The first 9 instances are solved within 120 seconds, and large instances are solved within 1.5 hours where the majority of computational time is spent on the scenario sub-problems. The range of the robust optimal objective is less than 0.7% across Table 3 Computational performance of solving 12 numerical instances of (DRO-FL). In all instances, the total variation distance for the ambiguity set is set to be 0.2. All instances are solved to optimality. three different sample-size options for each instance, and we have also observed that the robust optimal solution remains the same for different sample-size options. We further investigate the impact of the total variation distance d on the robust optimal decision and the objective value. In this study, we focus on the two instances (|S|, B) = (60, 20) and (|S|, B) = (60, 10). For a given sample size N (with N = 100, 200, 500), we tune the total variation distance d from 0 to 0.4 with an increment 0.05 and investigate the change of optimal objective of (DRO-FL). The results are shown in Figure 3 . Obviously, the objective should decrease with d as a larger d value can allow the worse scenarios get more weight and hence makes the objective decrease. With d increasing from 0 to 0.4, the decrement of the objective is 1.5% (1000 units) for the case B = 20, and 1.9% (700 units) for the case B = 10. The relative decrement is about one magnitude smaller compared to the 10% relative deviation in generating samples for the demand. It is also observed that the optimal decision of (DRO-FL) does not change with the sample size or the total variation distance. The distributionally-robust service center location problem investigated in this research possesses decision-dependent demand induced naturally from a maximum attraction principle and the number of opened service centers in a neighborhood. The ambiguity set considered in this work is a decision-independent one defined using the total-variational distance while many other metics such as the Wasserstein distance, Phi-divergences, etc. can also be applied to define the ambiguity set with minor modification in the formulation. The modeling approach in this work decouples the endogenous impact (decision dependency) and distributional ambiguity. As a result, the model is The impact of total variation distance on the robust optimal objective of (DRO-FL). highly computationally efficient for mid-and large scale instances compared to models with an endogenous uncertainty (ambiguity) set. In practice, such decoupling is likely to be more data driven, as there is a higher chance to identify data-based evidence that supports decision-dependent demand than decision-dependent uncertainty of demand. The model (DRO-FL) is a special case of the distributionally-robust two-stage stochastic program with a polyhedral ambiguity set. Algorithm 3 A cutting-plane algorithm for solving (DRO-FL). 1: Initialization: η * ← ∞, y * ← 0 |F | , n ← 0, y (n) ← 0 |F | , η (n) ← ∞ and SolSet ← ∅. 2: while y (n) is not in SolSet. do SolSet ← SolSet ∪ {y (n) }. Set n ← n + 1. Solve the master problem (Master) for iteration n, and let y (n) be the optimal solution. Evaluate the value function Q(y (n) , D ω ) for each ω ∈ Ω. Solve the linear program (10) to get the worst-case probability measure µ (n) . Add the inequality (11) to (Master). 9: end while 10: Set y * ← y (n) , and η * ← η (n) . 11: Return y * as the optimal solution and η * as the optimal objective. Theorem 2 (Bansal et al. 2018) . Algorithm 3 terminates in a finite number of iterations and return an optimal solution and optimal objective of (DRO-FL). Strategic planning under uncertainty: stochastic integer programming approaches The facility location problem with Bernoulli demands Demand estimation with heterogeneous consumers and unobserved product characteristics: a hedonic approach Decomposition algorithm for two-stage distributionally robust mixed binary programs Facility location: a robust optimization approach Distributionally robust facility location problem under decisiondependent stochastic demand Robust facility location problem for hazardous waste transportation Julia: A fresh approach to numerical computing Robust defibrillator deployment under cardiac arrest location uncertainty via row-and-column generation Four methodologies to improve healthcare demand forecasting Estimation and evaluation of future demand and supply of healthcare services based on a patient access area model A two-echelon stochastic facility location model for humanitarian relief logistics Jump: A modeling language for mathematical optimization A stochastic programming approach to planning of offshore gas filed developments under uncertainty in reserves A Lagrangian duality based branch and bound for solving linear stochastic programs with decision dependent uncertainty A class of stochastic programs with decision dependent uncertainty A novel branch and bound algorithm for optimal development of gas fields under uncertainty in reserves A distributionally robust optimization approach for two-stage facility location problems Optimization of community health center locations and service offerings with statistical need estimation Robust strategies for facility location under uncertainty Solution strategies for multistage stochastic programming with endogenous uncertainty Multisourcing supply network design: two-stage chanceconstrained model, tractable approximations, and computational results Facility location decisions with random disruptions and imperfect estimation A dual-based procedure for stochastic facility location Distributionally robust service center location problem with decision dependent utilities Service center location problem with decision dependent utilities Distributionally robust optimization with decision-dependent ambiguity set A model of supply-chain decisions for resource sharing with an application to ventilator allocation to combat COVID-19 Facility location under demand uncertainty: response to a largescale bio-terror attack Robust optimization with time-dependent uncertainty in radiation therapy Optimization under decision-dependent uncertainty Distributionally robust optimization with decision-dependent ambiguity set Variational theory for optimization under stochastic ambiguity A reliability-based stochastic traffic assignment model for network with multiple user classes under uncertainty in demand Facility location under uncertainty: a review Robust software partitioning with multiple instantiation A multistage stochastic programming approach with strategies for uncertainty reduction in the synthesis of process networks with uncertain yields Stochastic programming approach for planning of offshore oil or gas field infrastructure under decision-dependent uncertainty Computational strategies for non-convex multistage minlp models with decision-dependent uncertainty and gradual uncertainty resolution Endangered aquifers: Ggroundwater management under threats of catastrophic events Decision rules for information discovery in multi-stage stochastic programming Investing in the links of a stochastic network to minimize expected shortest path length On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming Algorithms for a facility location problem with stochastic customer demand and immobile servers The author is grateful to Dr. Liwei Zeng for a valuable discussion on getting an interpretation of the model.