key: cord-0228310-y3lhze9y authors: Shehadeh, Karmel S. title: Distributionally Robust Optimization Approaches for a Stochastic Mobile Facility Fleet Sizing, Routing, and Scheduling Problem date: 2020-09-23 journal: nan DOI: nan sha: 4f75faef4a37f961a06160b60bda22250ff2fc44 doc_id: 228310 cord_uid: y3lhze9y We propose two distributionally robust optimization (DRO) models for a mobile facility (MF) fleet sizing, routing, and scheduling problem (MFRSP) with time-dependent and random demand, as well as methodologies for solving these models. Specifically, given a set of MFs, a planning horizon, and a service region, our models aim to find the number of MFs to use (i.e., fleet size) within the planning horizon and a route and schedule for each MF in the fleet. The objective is to minimize the fixed cost of establishing the MF fleet plus a risk measure (expectation or mean conditional value-at-risk) of the operational cost over all demand distributions defined by an ambiguity set. In the first model, we use an ambiguity set based on the demand's mean, support, and mean absolute deviation. In the second model, we use an ambiguity set that incorporates all distributions within a 1-Wasserstein distance from a reference distribution. To solve the proposed DRO models, we propose a decomposition-based algorithm. In addition, we derive valid lower bound inequalities that efficiently strengthen the master problem in the decomposition algorithm, thus improving convergence. We also derive two families of symmetry-breaking constraints that improve the solvability of the proposed models. Finally, we present extensive computational experiments comparing the operational and computational performance of the proposed models and a stochastic programming model, demonstrating where significant performance improvements could be gained and derive insights into the MFRSP. A mobile facility (MF) is a facility capable of moving from one place to another, providing real-time service to customers in the vicinity of its location when it is stationary (Halper and Raghavan 2011) . In this paper, we study a mobile facility fleet sizing, routing, and scheduling problem (MFRSP) with stochastic demand. Specifically, in this problem, we aim to find the number of MFs (i.e., fleet size) to use in a given service region over a specified planning horizon and the route and schedule for each MF in the fleet. The demand level of each customer in each time period is random. The probability distribution of the demand is unknown, and only partial information about the demand (e.g., mean and range) may be available. The objective is to find the MF fleet size, routing, and scheduling decisions that minimize the sum of the fixed cost of establishing the MF fleet, the cost of assigning demand to the MFs (e.g., transportation cost), and the cost of unsatisfied demand (i.e., shortage cost). The concept of MF routing and scheduling is very different than conventional static facility location (FL) and conventional vehicle routing (VR) problems. In static FL problems, we usually consider opening facilities at fixed locations. Conventional VR problems aims at handling the movement of items between facilities (e.g., depots) and customers. A mobile facility is a facility-like vehicle that functions as a traditional facility when it is stationary, except that it can move from one place to another if necessary (Lei, Lin, and Miao 2014) . Thus, the most evident advantage of MFs over fixed facilities is their flexibility in moving to accommodate the change in the demand over time and location (Halper and Raghavan 2011 , Lei, Lin, and Miao 2014 . MFs are used in many applications ranging from cellular services, healthcare services, to humanitarian relief logistics. For example, light trucks with portable cellular stations can provide cellular service in areas where existing cellular network of base stations temporarily fails (Halper and Raghavan 2011) . Mobile clinics (i.e., customized MFs fitted with medical equipment and staffed by health professionals) can travel to rural and urban areas to provide various (prevention, testing, diagnostic) health services. Mobile clinics also offer alternative healthcare (service) delivery options when a disaster, conflict, or other events cause stationary healthcare facilities to close or stop operations (Blackwell and Bosse 2007 , Brown-Connolly, Concha, and English 2014 , Du Mortier and Coninx 2007 , Gibson et al. 2011 , Oriol et al. 2009 , Song et al. 2013 . For example, mobile clinics played a significant role in providing drive-through COVID-19 testing sites or triage locations during the COVID-19 pandemic. In 2019, the mobile health clinic market was valued at nearly 2 billion USD and is expected to increase to ∼12 billion USD by 2028 (Life Line Mobile Blog 2021). In humanitarian relief logistics, MFs give relief organizations the ability to provide aid to populations dispersed in remote and dense areas. These examples motivate the need for computationally efficient optimization tools to support decision-making in all areas of the MF industry. MF operators often seek a strategic and tactical plan, including the size of the MF fleet (strategic) and a routing plan for each MF in the fleet (tactical and operational) that minimize their fixed operating costs and maximize demand satisfaction. Determining the fleet size, in particular, is very critical as it is a major fixed investment for starting any MF-based business. The fleet sizing problem depends on the MF operational performance, which depends on the routing and scheduling decisions. The allocation of the demand to the MFs is also very important for the entire system performance (Lei, Lin, and Miao 2016) . For example, during the COVID-19 pandemic, Latino Connection, a community health leader, has established Pennsylvania's first COVID-19 Mobile Response Unit, CATE (i.e., Community-Accessible Testing & Education). The goal of CATE is to provide affordable and accessible COVID-19 education, testing, and vaccinations to low-income, vulnerable communities across Pennsylvania to ensure the ability to stay safe, informed, and healthy (CATE 2021) . During COVID-19, CATE published an online schedule consisting of the mobile unit stops and the schedule at each stop. A model that optimizes CATE's fleet size and schedules considering demand uncertainty could help improve CATE's operational performance and achieve better access to health services. Unfortunately, the MFRSP is a challenging optimization problem for two primary reasons. First, customers' demand is random and hard to predict in advance, especially with limited data during the planning process. Second, even in a perfect world in which we know with certainty the amount of demand in each period, the deterministic MFRSP is challenging because it is similar to the classical FL problem (Halper and Raghavan 2011, Lei, Lin, and Miao 2014) . Thus, the incorporation of demand variability increases the overall complexity of the MFRSP. However, ignoring demand uncertainty may lead to sub-optimal decisions and, consequently, the inability to meet customer demand (i.e., shortage). Failure to meet customer demand may lead to adverse outcomes, especially in healthcare, as it impacts population health. It also impacts customers' satisfaction and thus the reputation of the service providers and may increase their operational cost (due to, e.g., outsourcing the excess demand to other providers). To model uncertainty, Lei, Lin, and Miao (2014) assumed that the probability distribution of the demand is known and accordingly proposed the first a priori two-stage stochastic optimization model (SP) for a closely related MFRSP. Although attractive, the applicability of the SP approach is limited to the case in which we know the distribution of the demand or we have sufficient data to model it. In practice, however, one might not have access to a sufficient amount of high-quality data to estimate the demand distribution accurately. This is especially true in application domains where the use of mobile facilities to deliver services is relatively new (e.g., mobile COVID-19 testing clinics). Moreover, it is challenging for MF companies to obtain data from other companies (competitors) due to privacy issues. Finally, various studies show that different distributions can typically explain raw data of uncertain parameters, indicating distributional ambiguity (Mohajerin Kuhn 2018, Vilkkumaa and Liesiö 2021 ). Suppose we model uncertainty using a data sample from a potentially biased distribution or an assumed distribution (as in SP). In this case, the resulting nominal decision problem evaluates the cost only at this training sample, and thus the resulting decisions may be overfitted (optimistically biased). Accordingly, SP solutions may demonstrate disappointing out-of-sample performance ('black swans') under the true distribution (or unseen data). In other words, solutions of SP decision problems often display an optimistic in-sample risk, which cannot be realized in out-of-sample settings. This phenomenon is known as the Optimizers' Curse (i.e., an attempt to optimize based on imperfect estimates of distributions leads to biased decisions with disappointing performance) and is reminiscent of the overfitting effect in statistics (Smith and Winkler 2006) . Alternatively, one can construct an ambiguity set of all distributions that possess certain partial information about the demand. Then, using this ambiguity set, one can formulate a distributionally robust optimization (DRO) problem to minimize a risk measure (e.g., expectation or conditional value-at-risk (CVaR)) of the operational cost over all distributions residing within the ambiguity set. In particular, in the DRO approach, the optimization is based on the worst-case distribution within the ambiguity set, which effectively means that the distribution of the demand is a decision variable. DRO has received substantial attention recently in various application domains due to the following striking benefits. First, as pointed out by Mohajerin Esfahani and Kuhn (2018) , DRO models are more "honest" than their SP counterparts as they acknowledge the presence of distributional uncertainty. Therefore, DRO solutions often faithfully anticipate the possibility of black swan (i.e., out-of-sample disappointment). Moreover, depending on the ambiguity set used, DRO often guarantees an out-of-sample cost that falls below the worst-case optimal cost. Second, DRO alleviates the unrealistic assumption of the decision-maker's complete knowledge of distributions. Third, several studies have proposed DRO models for real-world problems that are more computationally tractable than their SP counterparts, see, e.g., Basciftci, Ahmed, and Shen (2021) , Luo and Mehrotra (2020) , Saif and Delage (2020) , Shehadeh and Sanci (2021) , Shehadeh and Tucker (2021) , Tsang and Shehadeh (2021) , Wang, Chen, and Liu (2020) , Wang et al. (2021) , Wu, Du, and Xu (2015) . In this paper, we propose tractable DRO approaches for the MFRSP. The ambiguity set is a key ingredient of DRO models that must (1) capture the true distribution with a high degree of certainty, and (2) be computationally manageable (i.e., allow for a tractable DRO model or solution method). There are several methods to construct the ambiguity set. Most applied DRO literature employs moment-based ambiguity Ye 2010, Zhang, Jiang, and Shen 2018) , consisting of all distributions sharing particular moments (e.g., mean-support ambiguity). The main advantage of the mean-support ambiguity set, for example, is that it incorporates intuitive statistics that a decision-maker may easily approximate and change. Moreover, various techniques have been developed to derive tractable moment-based DRO models. However, asymptotic properties of the moment-based DRO model cannot often be guaranteed because the moment information represents descriptive statistics. Recent DRO approaches define the ambiguity set by choosing a distance metric (e.g., φdivergence (Jiang and Guan 2016) , Wasserstein distance (Mohajerin Esfahani and Kuhn 2018, Gao and Kleywegt 2016) ) to describe the deviation from a reference (often empirical) distribution. The main advantage of Wasserstein ambiguity, for example, is that it enable decision-makers to incorporate possibly small-size data in the ambiguity set and optimization, enjoys asymptotic properties, and often offers a strong out-of-sample performance guarantee (Mohajerin Kuhn 2018, Mevissen, Ragnoli, and Yu 2013) . Recent results indicate that Wasserstein's ambiguity centered around a given empirical distribution contains the unknown true distribution with a high probability and is richer than other divergence-based ambiguity sets (in particular, they contain discrete and continuous distributions as compared to, e.g., φ-divergence ball centered at the empirical distribution which does not contain any continuous distribution, and Kullback-Leibler divergence ball, which must be absolutely continuous with respect to the nominal distribution). Despite the potential advantages, there are no moment-based, Wasserstein-based, or any other DRO approaches for the specific MFRSP that we study in this paper (see Section 2). This inspires this paper's central question: what are the computational and operational performance values of employing DRO to address demand uncertainty and ambiguity compared to the classical SP approach for the MFRSP. To answer this question, we design and analyze two DRO models based on the demand's mean, support, and mean absolute deviation ambiguity and Wasserstein ambiguity and compare the performance of these models with the classical SP approach. In this paper, we present two distributionally robust MF fleet sizing, routing, and scheduling (DMFRS) models for the MFRSP, as well as methodologies for solving these models. We summarize our main contributions as follows. 1. Uncertainty Modeling and Optimization Models. We propose the first two-stage DRO models for the MFRSP. These models aim to find the optimal (1) number of MFs to use within a planning horizon, (2) a routing plan and a schedule for the selected MFs, i.e., the node that each MF is located at in each time period, (3) assignment of MFs to customers. Decisions (1)- (2) are planning (first-stage) decisions, which cannot be changed in the short run. Conversely, the assignments of the demand are decided based on the demand realization, and thus are secondstage decisions. The objective is to minimize the fixed cost (i.e., cost of establishing the MF fleet and traveling inconvenience cost) plus the maximum of a risk measure (expectation or mean CVaR) of the operational cost (i.e., transportation and unsatisfied demand costs) over all possible distributions of the demand defined by an ambiguity set. In the first model (MAD-DRO), we use an ambiguity set based on the demand's mean, support, and mean absolute deviation (MAD). In the second model (W-DRO), we use an ambiguity set that incorporates all distributions within a 1-Wasserstein distance from a reference distribution. To the best of our knowledge, and according to our literature review in Section 2, our paper is the first to address the distributional ambiguity of the demand in the MFRSP using DRO. 2. Solution Methods. We derive equivalent solvable reformulations of the proposed mini-max nonlinear DRO models. We propose a computationally efficient decomposition-based algorithm to solve the reformulations. In addition, we derive valid lower bound inequalities that efficiently strengthen the master problem in the decomposition algorithm, thus improving convergence. 3. Symmetry-Breaking Constraints. We derive two families of new symmetry breaking constraints, which break symmetries in the solution space of the first-stage routing and scheduling decisions and thus improve the solvability of the proposed models. These constraints are independent of the method of modeling uncertainty. Hence, they are valid for any (deterministic and stochastic) formulation that employ the first-stage decisions of the MFRSP. Our paper is the first to attempt to break the symmetry in the solution space of these planning decisions of the MFRSP. 4. Computational Insights. We conduct extensive computational experiments comparing the proposed DRO models and a classical SP model empirically and theoretically, demonstrating where significant performance improvements can be gained. Specifically, our results show (1) how the DRO approaches have superior operational performance in terms of satisfying customers demand as compared to the SP approach; (2) the MAD-DRO model is more computationally efficient than the W-DRO model; (3) the MAD-DRO model yield more conservative decisions than the W-DRO model, which often have a higher fixed cost but significantly lower operational cost; (4) how mobile facilities can move from one location to another to accommodate the change in demand over time and location; (5) efficiency of the proposed symmetry breaking constraints and lower bound inequalities; (6) the trade-off between cost, number of MFs, MF capacity, and operational performance; and (7) the trade-off between the risk-neutral and risk-averse approaches. Most importantly, our results show the value of modeling uncertainty and distributional ambiguity. The remainder of the paper is structured as follows. In Section 2, we review the relevant literature. Section 3 details our problem setting. In Section 4, we present our SP. In Section 5, we present and analyze our proposed DRO models. In Section 6, we present our decomposition algorithm and strategies to improve convergence. In Section 7, we present our numerical experiments and corresponding insights. Finally, we draw conclusions and discuss future directions in Section 8. In this section, we review recent literature that is most relevant to our work, mainly studies that propose stochastic optimization approaches for closely related problems to the MFRSP. There is limited literature on MF as compared to stationary facilities. However, as pointed out by Lei, Lin, and Miao (2014) , the MFRSP share some features with several well-studied problems, including Dynamic Facility Location Problem (DFLP), Vehicle Routing Problem (VRP), and the Covering Tour Problem (CTP). First, let us briefly discuss the similarities and differences between the MFRSP and these problems. Given that we consider making decisions over a planning period, then the MFRSP is somewhat similar to DFLP, which seeks to locate/re-locate facilities over a planning horizon. To mitigate the impact of demand fluctuation along the planning period, decision-makers may open new facilities and close or relocate existing facilities at a relocation cost (Albareda-Sambola et al. (2009 ), Antunes et al. (2009 ), Contreras, Cordeau, and Laporte (2011 ), Drezner and Wesolowsky (1991 , Gendron (2015, 2017) , Van Roy and Erlenkotter (1982) ). Most DFLPs assume that the relocation time is relatively short as compared to the planning horizon. In contrast, the MFRSP takes into account the relocation time of MFs. In addition, each MF needs to follow a specific route during the entire planning horizon, which is not a requirement in DFLP. In CTP, one seeks to select a subset of nodes to visit that can cover other nodes within a particular coverage (Current, Velle, and Cohon (1985) , Flores-Garza et al. (2017) , Gendreau, Laporte, and Semet (1997) , Hachicha et al. (2000) , Tricoire, Graf, and Gutjahr (2012) ). In contrast to the MFRSP, CTP does not consider the variations of demand over time and assumes that the amount of demand to be met by vehicles is not related to the length of time the MF is spending at the stop. The VRP is one of the most extensively studied problem in operations research. The VRP also has numerous applications and variants (Subramanyam, Repoussis, and Gounaris 2020) . Both the MFRSP and the VRP consider the routing decisions of vehicles. However, the MFRSP is different than the VRP in the following ways (Lei, Lin, and Miao 2014) . First, in the MFRSP, we can meet customer demand by a nearby MF (e.g., cellular stations). In the VRP, vehicles visit customers to meet their demand. Second, the amount of demand that an MF can serve at each location depends on the duration of the MF stay, which is a decision variable. In contrast, VRPs often assume a fixed service time. Finally, most VRPs require that each customer has to be visited exactly once in each route. In contrast, in the MFRSP, some customers may not be visited, and some may be visited multiple times. Next, we review studies that proposed stochastic optimization approaches to problems similar to the MFRSP. Halper and Raghavan (2011) introduced the concept of MF and proposed a continuous-time formulation to model the maximum covering mobile facility routing problem under deterministic settings. To solve their model, Halper and Raghavan (2011) proposed several computationally effective heuristics. Lei, Lin, and Miao (2014) and Lei, Lin, and Miao (2016) are two closely related (and only) papers that proposed stochastic optimization approaches for MF routing and scheduling. Lei, Lin, and Miao (2014) assumed that the distribution of the demand is known and accordingly proposed the first a priori two-stage SP for MFRSP. Lei, Lin, and Miao (2014) 's SP seeks optimal first-stage routing and scheduling decisions to minimize the total expected system-wide cost, where the expectation is taken with respect to the known distribution of the demand. A priori optimization has a managerial advantage since it guarantees the regularity of service, which is beneficial for both customer and service provider. That is, a prior plan allow the customers to know when and where to receive service and enable MF service providers to be familiar with routes and better manage their time schedule during the day. The applicability of the SP approach is limited to the case in which the distribution of the demand is fully known, or we have sufficient data to model it. Robust optimization (RO) and distributionally robust optimization (DRO) are alternative techniques to model, analyze and optimize decisions under uncertainty and ambiguity (where the underlying distributions are unknown). RO assumes that the uncertain parameters can take any value from a pre-specified uncertainty set of possible outcomes with some structure (Bertsimas and Sim 2004 , Ben-Tal, Den Hertog, and Vial 2015 , Soyster 1973 . In RO, optimization is based on the worst-case scenario within the uncertainty set. Notably, Lei, Lin, and Miao (2016) are the first to motivate the importance of handling demand uncertainty using RO. They argue that RO is useful because it only requires moderate information about the uncertain demand rather than a detailed description of the probability distribution or a large data set. Specifically, Lei, Lin, and Miao (2016) proposed the first two-stage RO approach for MF feet sizing and routing problem with demand uncertainty. Lei, Lin, and Miao (2016) 's model aims to find the fleet size and routing decisions that minimize the fixed cost of establishing the MF fleet (first-stage) and a penalty cost for the unmet demand (second-stage). Optimization in Lei, Lin, and Miao (2016) 's RO model is based on the worst-case scenario of the demand occurring within a polyhedral uncertainty set. By focusing the optimization on the worst-case scenario, RO may lead to overly conservative and suboptimal decisions for other more-likely scenarios (Chen, Sim, and Xiong 2020, Delage and Saif 2021) . DRO models the uncertain parameters as random variables whose underlying probability distribution can be any distribution within a pre-defined ambiguity set. The ambiguity set is a family of all possible distributions characterized by some known properties of random parameters (Mohajerin Esfahani and Kuhn 2018) . In DRO, optimization is based on the worst-case distribution within this set. DRO is an attractive approach to model uncertainty with ambiguous distributions because: (1) it alleviates the unrealistic assumption of the decision-makers' complete knowledge of the distribution governing the uncertain parameters, (2) it is usually more computationally tractable than its SP and RO counterparts Saif 2021, Rahimian and Mehrotra 2019) , and (3) one can use minimal distributional information or a small sample to construct the ambiguity set and then build DRO models. Rahimian and Mehrotra (2019) provide a comprehensive survey of the DRO literature. The computational tractability of DRO models depends on the ambiguity sets. These sets are often based on moment information (Delage and Ye 2010 , Mehrotra and Zhang 2014 , Zhang, Jiang, and Shen 2018 or statistical measures such as the Wasserstein distance (Mohajerin Esfahani and Kuhn 2018). To derive tractable DRO models for the MFRSP, we construct two ambiguity sets of demand, one based on 1-Wasserstein distance and one using the demand's support, mean, and mean absolute deviation (MAD). As mentioned in the introduction, we use the Wasserstein ambiguity because it is richer than other divergence-based ambiguity sets. We use the MAD as a dispersion measure instead of the variance because it allows tractable reformulation and better captures outliers and small deviations. In addition, the MAD exists for some distributions while the second moment does not Hochman 1985, Postek et al. 2018) . We refer to Postek et al. (2018) and references therein for rigorous discussions on properties of MAD. Next, we discuss some relevant results on the mean-support-MAD ambiguity set (henceforth denoted as MAD ambiguity). Ben-Tal and Hochman (1972) derived tight upper and lower bounds on the expectation of a general convex function of a random variable under MAD ambiguity. In particular, when the random variable is one-dimensional, Ben-Tal and Hochman (1972) show that the worst-case distribution under MAD ambiguity is a three-point distribution on the mean, support, and MAD. Recently, Postek et al. (2018) used the results of Ben-Tal and Hochman (1972) to treat ambiguous expected feasibility constraints to obtain exact reformulations for both functions that are convex and concave in the components of the random variable under MAD ambiguity. A reviewer of this paper brought our attention to the results in a working paper by Long, Qi, and Zhang (2021) on supermodularity in two-stage DRO problems. Specifically, Long, Qi, and Zhang (2021) identified a tractable class of two-stage DRO problems based on the scenario-based ambiguity set proposed by Chen, Sim, and Xiong (2020) . They showed that any two-stage DRO problem with mean, support, and upper bounds of MAD has a computationally tractable reformulation Comparison between Lei, Lin, and Miao (2014) , Lei, Lin, and Miao (2016) , and our paper. whenever the second-stage cost function is supermodular in the random parameter. Furthermore, they proposed an algorithm to compute the worst-case distribution for this reformulation. They argued that using the computed worst-case distribution in the reformulation can make the twostage DRO problem tractable. In addition, they provided a necessary and sufficient condition to check whether any given two-stage optimization problem has the property of supermodularity. In Appendix D, we show that even if our recourse is supermodular in demand realization, the number of support points in the worst-case distribution of the demand is large, which renders our twostage MAD-DRO model computationally challenging to solve using Long, Qi, and Zhang (2021)'s approach. In contrast, we can efficiently solve an equivalent reformulation of our MAD-DRO model using our proposed decomposition algorithm. Despite the potential advantages, there are no DRO approaches for the specific MFRSP that we study in this paper. Therefore, our paper is the first to propose and analyze DRO approaches for the MFRSP. In Figure 1 , we provide a comparison between Lei, Lin, and Miao (2014) , Lei, Lin, and Miao (2016) , and our approach based on the assumption made on uncertainty distribution, proposed stochastic optimization approach, decision variables, objectives, and addressing symmetry. We note that our paper and these papers share the common goal of deriving generic optimization models that can be used in any application of MF where one needs to determine the same sets of decisions under the same criteria/objective considered in each paper. We make the following observations from Figure 1 . In contrast to Lei, Lin, and Miao (2016) , we additionally incorporate the MF traveling inconvenience cost in the first-stage objective and the random transportation cost in the second-stage objective. In contrast to Lei, Lin, and Miao (2016) and Lei, Lin, and Miao (2014) , we model both uncertainty and distributional ambiguity and optimize the system performance over all demand distributions residing within the ambiguity sets. Our master and sub-problems and lower bound inequalities have a different structure than those of Lei, Lin, and Miao (2016) due to the differences in the decision variables and objectives. We also propose two families of symmetry-breaking constraints, which break symmetries in the solution space of the routing and scheduling decisions. These constraints can improve the solvability of any formulation that uses the same routing and scheduling decisions of the MFRSP. Lei, Lin, and Miao (2014) and Lei, Lin, and Miao (2016) did not address the issue of symmetry in the MFRSP. Finally, to model decision makers' risk-averse attitudes, we propose both risk-neutral (expectation) and risk-averse (mean-CVaR-based) DRO models for the MFRSP. Lei, Lin, and Miao (2014) and Lei, Lin, and Miao (2016) models are risk-neutral. Finally, it is worth mentioning that our work uses similar reformulation techniques in recent DRO static FL literature (see, e.g., Basciftci, Ahmed, and Shen (2021) , Luo and Mehrotra (2020) , Saif and Delage (2020), Shehadeh and Sanci (2021) , Shehadeh and Tucker (2021) , Tsang and Shehadeh (2021) , Wang, Chen, and Liu (2020) , Wang et al. (2021) , Wu, Du, and Xu (2015) and references therein). and we assume that the length of each period t ∈ T is sufficiently short such that, without loss of generality, all input parameters are the same from one time period to another (this is the same assumption made in Lei, Lin, and Miao (2014) and Lei, Lin, and Miao (2016) ). The demand, W i,t , of each customer i in each time period t is random. The probability distribution of the demand is unknown, and only a possibly small data on the demand may be available. We assume that we know the mean µ µ µ : We consider the following basic features as in Lei, Lin, and Miao (2014) : (1) each MF has all the necessary service equipment and can move from one place to another; (2) all MFs are homogeneous, providing the same service, and traveling at the same speed; (3) we explicitly account for the travel time of the MF in the model, and service time are only incurred when the MF is not in motion; (4) the travel time t j,j from location j to j is an integer multiplier of a single time period Miao (2014, 2016) ; and (5) the amount of demand to be served is proportional to the duration of the service time at the location serving the demand. We consider a cost f for using an MF, which represents the expenses associated with purchasing or renting an MF, staffing cost, equipment, etc. Each MF has a capacity limit C, which represents the amount of demand that an MF can serve in a single time unit. Due to the random fluctuations of the demand and the limited capacity of each MF, there is a possibility that the MF fleet will fail to satisfy customers' demand fully. To minimize shortage, we consider a penalty cost γ for each unit of unmet demand. This penalty cost can represent the opportunity cost for the loss of demand or expense for outsourcing the excess demand to other companies (Basciftci, Ahmed, and Shen 2021, Lei, Lin, and Miao 2016) . Thus, maximizing demand satisfaction is an important objective that we incorporate in our model (Lei, Lin, and Miao 2014) . Given that an MF cannot provide service when in motion, it is not desirable to keep it moving for a long time to avoid losing potential benefits. On the other hand, it is not desirable to keep the MF stationary all the time because this may lead to losing the potential benefits of making a strategic move to locations with higher demand. Thus, the trade-off of the problem includes the decision to move or keep the MF stationery. Accordingly, we consider a traveling inconvenience cost α to discourage unnecessary moving in cases where moving would neither improve nor degrade the total performance. As in Lei, Lin, and Miao (2014) , we assume that α is much lower than other costs such that its impact over the major trade-off is negligible. We assume that the quality of service a customer receives from a mobile facility is inversely proportional to the distance between the two to account for the "access cost" (this assumption is common in practice and in the literature, see, e.g., Ahmadi-Javid, Seyedi, and Syam (2017), Reilly (1931), Drezner (2014) , Lei, Lin, and Miao (2016) , Berman, Drezner, and Wesolowsky (2003) , Lei, Lin, and Miao (2014) ). Accordingly, we consider a demand assignment cost that is linearly proportional to the distance between the customer point and the location of an MF, i.e., βd i,j , where β ≥ 0 represents the assignment cost factor per demand unit and per distance unit. Table 1 summarizes these notation. Given a set of MFs, M , T , I, and J, our models aim to find: (1) the number of MFs to use within T ; (2) a routing plan and a schedule for the selected MFs, i.e., the node that each MF is located at in each time period; and (3) assignment of MFs to customers. Decisions (1)-(2) are first-stage decisions that we make before realizing the demand. The assignment decisions (3) represent the recourse (second-stage) actions in response to the first-stage decisions and demand realizations (i.e., you cannot assign demand to the MFs before realizing the demand). The objective is to minimize the fixed cost (i.e., cost of establishing the MF fleet and traveling inconvenience cost) plus a risk measure (expectation or mean CVaR) of the operational cost (i.e., transportation and unsatisfied demand costs). We refer to Lei, Lin, and Miao (2014) for an excellent visual representation of MF operations and some of the basic features mentioned above. Additional Notation: For a, b ∈ Z, we define [a] := {1, 2, . . . , a} as the set of positive integer running indices to a. Similarly, we define [a, b] Z := {c ∈ Z : a ≤ c ≤ b} as the set of positive running indices from a to b. The abbreviations "w.l.o.g." and "w.l.o.o." respectively represent "without loss of generality" and "without loss of optimality." In this section, we present a two-stage SP formulation of the MFRSP that assumes that the probability distribution of the demand is known. A complete listing of the parameters and decision variables of the model can be found in Table 1 . First, let us introduce the variables and constraints defining the first-stage of this SP model. For each m ∈ M , we define a binary decision variable y m that equals 1 if MF m is used, and is 0 otherwise. For all j ∈ J, m ∈ M , and t ∈ T , we define a binary decision variable x t j,m that equals 1 if MF m stays at location j at period t, and is 0 otherwise. The feasible region X of variables x x x and y y y is defined in (2). As defined in Lei, Lin, and Miao (2014) , region X represents: (1) the requirement that an MF can only be in service when it is stationary; (2) MF m at location j in period t can only be available at location j = j after a certain period of time depending on the time it takes to travel from location j to j , t j,j (i.e., x t j,m = 0 for all j = j and t ∈ {t, . . . , min{t + t j,j , T }}); and (3) MF m has to be in an active condition before providing service. We refer the reader to Appendix A for a detailed derivation of region X . Now, let us introduce the variables defining our second-stage problem. For all (i, j, m, t), we define a nonnegative continuous variable z t i,j,m to represent the amount of node i demand served by MF m located at j in period t. For each t ∈ T , we define a nonnegative continuous variable u i,t to represent the amount of unmet demand of node i in period t. Finally, we define a random vector W W W := [W 1,1 . . . . , W |I|,|T | ] . Our SP model can now be stated as follows: where for a feasible (x x x,y y y) ∈ X and a realization of W W W Q(x x x,W W W ) := min z,u z,u z,u j∈J i∈I m∈M t∈T distance between any pair of nodes i and j t j,j travel time from j to j C the amount of demand that can be served by an MF in a single time unit, i.e., MF capacity W i,t demand at customer site i for each period t W i,t /W i,t lower/upper bound of demand at customer location i for each period t γ penalty cost for each unit of unmet demand Formulation (3) aims to find first-stage decisions (x x x,y y y) ∈ X that minimize the sum of (1) the fixed cost of establishing the MF fleet (first term); (2) the traveling inconvenience cost 1 (second term); and (3) a risk measure (·) of the random second-stage function Q(x x x,W W W ) (third term). A risk-neutral decision-maker may opt to set (·) = E(·), whereas a risk-averse decision-maker might set (·) as the CVaR or mean-CVaR. Classically, the MFRSP literature employs (·) = E(·), which might be more intuitive for MF providers. For brevity, we relegate further discussion of the mean-CVaR-based SP model to Appendix G. In this section, we present our proposed DRO models for the MFRSP that do not assume that the probability distribution of the demand is known. In Sections 5.1 and 5.2, we respectively present and analyze the risk-neutral MAD-DRO and W-DRO models. For brevity, we relegate the formulations and discussions of the risk-averse mean-CVaR-based DRO models to Appendix F. In this section, we present our proposed MAD-DRO model, which is based on an ambiguity set that incorporates the demand's mean (µ µ µ), MAD (η η η), and support (S). As mentioned earlier, we use the MAD as a dispersion or variability measure because it allows us to derive a computationally attractive reformulation (Postek et al. 2018 , Wang, Zhang, and Tang 2019 , Wang, Chen, and Liu 2020 . First, let us introduce some additional sets and notation defining our MAD ambiguity set and MAD-DRO model. We define E P as the expectation under distribution P. We let and η i,t = E P (|W i,t − µ i,t |) respectively represent the mean value and MAD of W i,t , for all i ∈ I and t ∈ T . Using this notation and the support S defined in (1), we construct the following MAD ambiguity set: where P(S) in F(S,µ µ µ,η η η) represents the set of distributions supported on S with mean µ µ µ and dispersion measure ≤ η η η. Using F(S,µ µ µ,η η η) defined in (5), we formulate our MAD-DRO model as The MAD-DRO formulation in (6) seeks first-stage decisions (x x x,y y y) that minimize the first-stage cost and the worst-case expectation of the second-stage (recourse) cost, where the expectation is taken over all distributions residing in F(S,µ µ µ,η η η). Note that we do not incorporate higher moments (e.g., co-variance) in F(S,µ µ µ,η η η) for the following primary reasons. First, the mean and range are intuitive statistics that a decision-maker may approximate and change in the model (e.g., the mean may be estimated from limited data or approximated by subject matter experts, and the range may represent the error margin in the estimates). Second, it is not straightforward for decisionmakers to approximate or accurately estimate the correlation between uncertain parameters. Third, mathematically speaking, various studies have demonstrated that incorporating higher moments in the ambiguity set often undermines the computational tractability of DRO models and, therefore, their applicability in practice. Indeed, as we will show next, using F(S,µ µ µ,η η η) allows us to derive a tractable equivalent reformulation of the MAD-DRO model and an efficient solution method to solve the reformulation (see Sections 5.1.1, 6.1, and 7.2). Finally, note that parameters W i,t , µ i,t , W i,t , W i,t , η i,t are all indexed by time period t and location i. Thus, if in any application, there is a relationship (e.g., correlation) between the demand of a subset of locations in a subset of periods, one can easily adjust S, µ µ µ, and η η η in F(S,µ µ µ,η η η) to reflect this relationship. For example, if urban cities have higher demand, then we can adjust the mean and range of the demand of these cities to reflect such a relationship. Similarly, if there is a correlation between the time period t and the demand, then we can define the mean and the support based on this correlation. For example, the morning service hours may have lower demand on Monday. In this case, we can adjust µ µ µ, S, and η η η to reflect this relationship. Similarly, if the demand in a subset of periods and locations is correlated, we can adjust S, µ µ µ, and η η η in F(S,µ µ µ,η η η) to reflect this relationship. Nevertheless, we acknowledge that not incorporating higher moments and complex relations may be a limitation of our work and thus is worth future investigation. Recall that Q(x x x,W W W ) is defined by a minimization problem; hence, in (6), we have an inner maxmin problem. As such, it is not straightforward to solve formulation (6) in its presented form. In this section, we derive an equivalent formulation of (6) that is solvable. First, in Proposition 1, we present an equivalent reformulation of the inner problem in (6) (see Appendix B for a proof). Proposition 1. For any fixed (x x x,y y y) ∈ X , problem sup Again, the problem in (7) involves an inner max-min problem that is not straightforward to solve in its presented form. However, we next derive an equivalent reformulation of the inner problem in (7) that is solvable. First, we observe that Q(x x x,W W W ) is a feasible linear program (LP) for a given first-stage solution (x x x,y y y) ∈ X and a realization of W W W . The dual of Q(x x x,W W W ) is as follows. where λ λ λ and v v v are the dual variables associated with constraints (4b) and (4c), respectively. It is easy to see that w.l.o.o, λ i,t ≥ 0 for all i ∈ I due to constraints (8c) and the objective of maximizing (8b) and (8d). Given the objective of maximizing a nonnegative term Cx t j,m multiplied by v t j,m , v t j,m = min{min i {βd i,j − λ i,t }, 0} in the optimal solution. Given that β β β, d d d, and λ λ λ are finite, v v v is finite. It follows that problem view of dual formulation (8), we can rewrite the inner maximization problem max{·} in (7) as max λ,v,W,k λ,v,W,k λ,v,W,k t∈T i∈I Note that the objective function in (9) contains the interaction term λ i,t W i,t . To linearize formulation (9), we define π i,t = λ i,t W i,t for all i ∈ I and t ∈ T . Also, we introduce the following McCormick inequalities for variables π i,t : Accordingly, for a fixed (x x x ∈ X ,ρ ρ ρ,ψ ψ ψ), problem (9) is equivalent to the following mixed-integer linear Combining the inner problem in the form of (11) with the outer minimization problems in (7) and (6), we derive the following equivalent reformulation of the MAD-DRO model in (6): min x,y,ρ,ψ,δ x,y,ρ,ψ,δ x,y,ρ,ψ,δ m∈M where h(x x x,ρ ρ ρ,ψ ψ ψ) = max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I Proposition 2. For any fixed values of variables (x x x,y y y) ∈ X , ρ ρ ρ, and ψ ψ ψ, h(x x x,ρ ρ ρ,ψ ψ ψ) < +∞. Fur- is a convex piecewise linear function in x x x, ρ ρ ρ, and ψ ψ ψ with a finite number of pieces (see Appendix C for a detailed proof ). In this section, we consider the case that P may be observed via a possibly small finite set and F 2 , respectively, where probability distributions F 1 and F 2 are defined over the common support S. The 1-Wasserstein distance dist(F 1 , F 2 ) between F 1 and F 2 is the minimum transportation cost of moving from F 1 to F 2 , where the cost of moving masses where P(F 1 , F 2 ) is the set of all joint distributions of (W W W 1 ,W W W 2 ) supported on S with marginals (F 1 , F 2 ). Accordingly, we construct the following 1-Wasserstein ambiguity set: where P(S) is the set of all distributions supported on S,P N = 1 N N n=1 δŴ W W n is the empirical distribution of W W W based on N i.i.d samples, and > 0 is the radius of the ambiguity set. The set F(P N , ) represents a Wasserstein ball of radius centered at the empirical distributionP N . Using the ambiguity set F(P N , ) defined in (14), we formulate our W-DRO model as Formulation (15) finds first-stage decisions (x x x,y y y) ∈ X that minimize the first-stage cost and the maximum expectation of the second-stage cost over all distributions residing in F(P N , ). The W-DRO model in (15) can be used to model uncertainty in general and distributional ambiguity when there is a possibly small finite data sample on uncertainty. As detailed in Mohajerin Esfahani and Kuhn (2018) and discussed earlier, if we have a small sample and we optimize using this sample, then the optimizer's curse cannot be avoided. To mitigate the optimizer's curse (estimation error), we robustify the nominal decision problem (the MFRSP optimization problem) against all distributions P under which the estimated distributionP N based on the N data points has a small estimation error (i.e., with dist(P,P N ) ≤ ). Therefore, in some sense one can think of Wasserstein ball as the set of all distributions under which our estimation error is below , where is the maximum estimation error against which we seek protection. When = 0, the ambiguity set contains the empirical distribution and the W-DRO problem in (15) reduces to the SP problem. A larger radius indicates that we seek more robust solutions (see Appendix K). In the next section, we show that using 1 -norm instead of the p -norm (1 < p < ∞) in our Wasserstein ambiguity set allows us to derive a linear and tractable reformulation of the W-DRO model in (15). Note that 1 -norm (i.e., the sum of the magnitudes of the vectors in space) is the most intuitive and natural way to measure the distance between vectors. In contrast, the ∞norm-based Wasserstein ball is an extreme case. That is, the ∞-norm gives the largest magnitude among each element of a vector. Thus, from the perspective of the Wasserstein DRO framework, the ∞-norm-based distance metric only picks the most influential value to determine the closeness between data points (Chen and Paschalidis 2018) , which, in our case, may not be reasonable since every demand point plays a role. Deriving and comparing DRO models with different Wasserstein sets is out of the scope of this paper but is worth future investigation in more comprehensive MF optimization problems. In this section, we derive an equivalent solvable reformulation of the W-DRO model in (15). First, in Proposition 3 we present an equivalent dual formulation of the inner maximization problem sup [·] in (15) (see Appendix E for a detailed proof). Proposition 3. For for a fixed (x x x,y y y) ∈ X , problem sup Formulation (16) is potentially challenging to solve because it require solving N non-convex optimization problems. Fortunately, given that the support of W W W is rectangular and finite (see Assumption 1) and Q(x x x,W W W ) is feasible and bounded for every x x x and W W W , we next recast these inner problems as LPs for each ρ ≥ 0 and x x x ∈ X . First, using the dual formulation of Q(x x x,W W W ) in (8), we rewrite the inner problem sup{·} in (16) for each n as Second, using the same techniques in Section 5.1.1, we define an epigraphical random variable η n i,t for the term |W i,t − W n i,t |. Then, using variables η η η, π i,t = λ i,t W i,t , and inequalities (10a)-(10b) for variables π i,t , we derive the following equivalent reformulation of (17) (for each n ∈ [N ]): Third, combining the inner problem in the form of (18) with the outer minimization problems in (16) and (15), we derive the following equivalent reformulation of the W-DRO model in (15) Z(N, ) = min Using the same techniques in the proof of Proposition 2, one can easily verify that function Cx t j,m v t j,m ] < ∞ and is a convex piecewise linear function in x x x ∈ X and ρ. In this section, we present a decomposition-based algorithm to solve the MAD-DRO formulation in (12), and strategies to improve the solvability of the formulation. The algorithmic steps for solving the W-DRO in (19) are similar. In Section 6.1, we present our decomposition algorithm. In Section 6.2, we derive valid lower bound inequalities to strengthen the master problem in the decomposition algorithm. In Section 6.3, we derive two families of symmetry breaking constraints that improve the solvability of the proposed models. Proposition 2 suggests that constraint (12c) describes the epigraph of a convex and piecewise linear function of decision variables in formulation (12). Therefore, given the two-stage characteristics of MAD-DRO in (12), it is natural to attempt to solve problem (12) via a separation-based decomposition algorithm. Algorithm 1 presents our proposed decomposition algorithm, and the algorithm for the W-DRO model in (19) has the same steps. Algorithm 1 is finite because we identify a new piece of the function max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I is augmented in step 4, and the function has a finite number of pieces according to Proposition 2. Note that this algorithm is based on the same theory and art of cutting plane-based decomposition algorithms employed in various other papers using decomposition to solve problems with similar structure. Nevertheless, we customized Algorithm 1 to solve our proposed DRO models. In addition, in the following subsections, we derive problem-specific valid inequalities to strengthen the master problem, thus improving convergence. In this section, we aim to incorporate more second-stage information into the first-stage without adding optimality cuts into the master problem by exploiting the structural properties of the Algorithm 1: Decomposition algorithm. 2. Master Problem. Solve the following master problem and record the optimal solution (x x x * ,ρ ρ ρ * ,ψ ψ ψ * , δ * ) and optimal value Z * . Set LB = Z * . 3. Sub-problem. and record optimal solution (π * , λ * , W * , v * , k * π * , λ * , W * , v * , k * π * , λ * , W * , v * , k * ) and h(x x x,ρ ρ ρ,ψ ψ ψ) * . then stop and return x x x * and y y y * as the optimal solution to problem (20) (i.e., MAD-DRO in (12)). else add the cut δ ≥ t∈T i∈I to the set of cuts {L(x x x, δ) ≥ 0} and go to step 2. end if recourse problem. We first observe that once the first-stage solutions and the demand are known, the second-stage problem can be decomposed into independent sub-problems with respect to time periods. Accordingly, we can construct cuts for each sub-problem in step 4. Let δ t represent the optimality cut for each period t, we replace δ in (20a) with t δ t and add constraints The original single cut is the summation of multiple cuts of the form, i.e., δ = t∈T δ t . Hence, in each iteration, we incorporate more or at least an equal amount of information into the master problem using (22) as compared with the original single cut approach. In this manner, the optimality cuts become more specific, which may result in better lower bounds and, therefore, a faster convergence. In Proposition 4, we further identify valid lower bound inequalities for each time period to tighten the master problem (see Appendix H for a proof). i∈I min{γ, min It follows that for all t ∈ T , are valid. Suppose there are three homogeneous MFs. As such, solutions y = [1, 1, 0] , y = [0, 1, 1] , and y = [1, 0, 1] are equivalent (i.e., yield the same objective) in the sense that they all permit 2 out of 3 MFs to be used in the planning period. To avoid wasting time exploring such equivalent solutions, we assume that MFs are numbered sequentially and add constraints (24) to the first-stage. Constraints (24) enforce arbitrary ordering or scheduling of MFs. Second, recall that in the first period, t = 1, we decide the initial locations of the MFs. Therefore, it doesn't matter which MF is assigned to location j. For example, suppose that we have three candidate locations, and MFs 1 and 2 are active. Then, feasible solutions (x 1 1,1 = 1, x 1 3,2 = 1) and (x 1 1,2 = 1, x 1 3,1 = 1) yield the same objective. To avoid exploring such equivalent solutions, we define a dummy location J + 1 and add constraints (25a)-(25b) to the first-stage. Constraints (25a)-(25b) are valid for any formulation that uses the same sets of first-stage routing and scheduling decisions and constraints. We derived constraints (24) Although breaking symmetry is very important and standard in integer programming problems, our paper is the first to attempt to break the symmetry in the solution space of the first-stage planning decisions of the MFRSP. In Section 7.3, we demonstrate the computational advantages that could be gained by incorporating these inequalities. In this section, we conduct extensive computational experiments comparing the proposed DRO models and a sample average approximation (SAA) of the SP model computationally and operationally, demonstrating where significant performance improvements could be gained. The sample (3) with P replaced by an empirical distribution based on N samples of W W W (see Appendix I for the formulation). In Section 7.1, we describe the set of problem instances and discuss other experimental setups. In Section 7.2, we compare solution time of the proposed models. In Section 7.3, we demonstrate efficiency of the proposed lower bound inequalities and symmetry breaking constraints. We compare optimal solutions of the proposed models and their out-of-sample performance in Sections 7.4 and 7.5, respectively. We analyze the sensitivity of the DRO expectation models to different parameter settings in Section 7.6. We close by comparing the risk-neutral and risk-averse models under critical parameter settings in Section 7.7. We constructed 10 MFRSP instances, in part based on the same parameters settings and assumptions made in Lei, Lin, and Miao (2014) and Lei, Lin, and Miao (2016) . We summarize these instances in Table 2 . Each of the 10 instances is characterized by the number of customers locations I, number of candidate locations J, and the number of periods T . Instances 1-4 are from Lei, Lin, and Miao (2014) and Instances 5-10 are from Lei, Lin, and Miao (2016) . These benchmark instances represent a wide range of potential service regions in terms of problem size as a function of the number of demand nodes/locations and time periods. For example, if we account for the scale of the problem in the sense of a static facility location problem, instance 10 consists of 30 × 20 = 600 customers, which is relatively large for many practical applications. In addition, we constructed a service region based on 20 selected nodes (see Figure 2 ) in Lehigh County of Pennsylvania (USA). Then, as detailed below, we constructed two instances (denoted as Lehigh 1 and Lehigh 2) based on this region. Table 2 , we generated a total of I vertices as uniformly distributed random numbers on a 100 by 100 plane and computed the distance between each pair of nodes in Euclidean sense as in Lei, Lin, and Miao (2014) . For Lehigh county instances, we first extracted the latitude and longitude of each node and used Bing Maps Developer API to compute the travel time in minutes between each pair of nodes. We followed the same procedures in the DRO scheduling and facility location literature to generate random parameters as follows. For instances 1-10, we generated µ i,t from a uniform distribution 20, 60] , and set the standard deviation σ i,t = 0.5µ i,t , for all i ∈ I and t ∈ T . For Lehigh Map of 20 cities in Lehigh County. We created this map using the geoscatter function (MATLAB). county instances, we used the population estimate for each node based on the most updated information posted on the 2010 US Census Bureau (see Appendix J) to construct the following two demand structures. In Lehigh 1, we generated the demand's mean as follows: if the population ≥10,000, we set µ i,t = 40 (i.e., the mean of U [20, 60]); if the population ∈ [5,000, 10,000), we set the mean to µ = 30; if the population ∈ [1,000, 5,000), we set µ i,t = 20; and if the population < 1, 000, we set µ i,t = 15. In Lehigh 2, we use the population percentage (weight) at each node to generate the demand's mean as µ i,t = min(60, population% × 1000). To a certain extent, these structures reflect what may be observed in real life, i.e., locations with more population may potentially create greater demand. We refer to Appendix J for the details related to Lehigh 1 and Lehigh 2. For each instance, we sample N realizations W n i,t , . . . , W n I,T , for n = 1, . . . , N , by following lognormal (LogN) distributions with the generated µ i,t and σ i,t . We round each parameter to the nearest integer. We solve the SAA and W-DRO models using the N sample and the MAD-DRO model with We assume that all cost parameters are calculated in terms of present monetary value. Specifically, as in Lei, Lin, and Miao (2014) , for each instance, we randomly generate (1) the fixed cost from a uniform distribution U [a, b] with a = 1000 and b = 1500; (2) the assignment cost factor per unit distance per unit demand β from U [0.0001a, 0.0001b]; and (3) the penalty cost per unit demand γ form U [0.01a, 0.01b]. Finally, we set the traveling inconvenience cost factor to 0.0001a, and unless stated otherwise, we use a capacity parameter C = 100. We implemented all models and the decomposition algorithm using AMPL2016 Programming language calling CPLEX V12.6.2 as a solver with default settings. We run all experiments on a MacBook Pro with Apple M1 Max Chip, 32GB of memory, and 10-core CPUs. Finally, we imposed a solver time limit of 1 hour. In this section, we analyze solution times of the proposed DRO models. We consider two ranges of the demand: W W W ∈ [20, 60] (base-case) and W W W ∈ [50, 100] (higher demand variability and volume). We also consider two MF capacities: C = 60 (relatively small capacity) and C = 100 (relatively large capacity). We focus on solving problem instances with a small sample size, which is often seen in most real-world applications (especially in healthcare) and is the primary motivation for using DRO. Specifically, we use N = 10 and N = 50 as a sample size for the W-DRO model. For each of the 10 instances in Table 2 , demand range, C, and N , we generated and solved 5 instances using each model as described in Section 7.1. Let us first analyze solution times of the risk-neutral DRO models. Tables 3-4 Intuitively, when C = 100, each facility can satisfy more demand than C = 60. Thus, there are more feasible choices for the MF fleet size and schedule when C = 100. That said, the search space when C = 100 is larger, potentially leading to a longer computational time. Third, using the MAD-DRO model, we were able to solve all instances within the time limit. 9 12 17 3 3 4 15 16 18 6 6 7 3 15 10 16 32 55 3 3 3 35 48 61 6 7 9 4 15 20 30 34 43 3 3 3 71 81 85 8 9 10 5 20 10 29 42 76 3 3 3 75 79 81 9 10 11 6 20 20 82 92 108 3 3 3 110 182 229 7 11 13 7 25 10 108 123 135 9 10 11 151 192 243 9 10 12 times of the W-DRO model with N = 50 are generally longer than the MAD-DRO model. This also makes sense because the MAD-DRO model is a smaller deterministic model (i.e., it has fewer variables and constraints). Let us now analyze solution times of the risk-averse models. We use MAD-CVaR (W-CVaR) to denote the mean-CVaR-based DRO model with MAD (1-Wasserstein) ambiguity. Since we observe similar computational performance with different values of Θ ∈ [0, 1), we present results with Θ = 0. the W-CVaR model, we were able to solve Instances 1-5. Solution times of these instances are generally longer than the risk-neutral model, especially when N = 50. It is not surprising that the CVaR models are more computationally challenging to solve than the risk-neutral models because the former models are larger (have more variables and constraints). In particular, the master problem of the CVaR models in the decomposition algorithm is larger than the expectation models (see Algorithm 2 in Appendix F). Finally, it is worth mentioning that using an enhanced multicut L-shaped (E-LS) method to solve their SP model, Lei, Lin, and Miao (2014) were able to solve Instance 1-4. The average solution time of Instance 4 using E-LS is 3000 seconds obtained at 5% optimality gap. The CVaR-based SP model is more challenging to solve than the risk-neutral SP model. In this section, we study the efficiency of symmetry breaking constraints (24)-(25) and lower bounding inequalities (23). Given the challenges of solving large instances without (23)-(25), we use Instance 1 with W W W ∈ [20, 60] and C = 100 in this experiment. First, we separately solve the proposed models with and without symmetry-breaking (SB) constraints (24)-(25). First, we observe that without these SB constraints, solution times of Instance 1 using (W-DRO, MAD-DRO, SP) significantly increase from (20, 6, 70) to (1,765, 1,003, 3,600) seconds. Instances 3-10 terminated with a large gap after one hour without these SB constraints. In this section, we compare the optimal solutions of the SP, MAD-DRO, and W-DRO models. Given that the SP model can only solve small instances to optimality, for a fair comparison and brevity, we use Instance 3 (I = 15, J = 15, and T = 10) as an example of an average-sized instance. In addition, we present results for Lehigh 1 and Lehigh 2. Table 9 presents the number of MFs (i.e., fleet size) for each instance. We observe the following from Tripoli, and Wescosvill and Slatedale. That is, in the second period, we decreased the demand of the 6 nodes with the highest demand (Allentown-Wescosvill) to that of the nodes that generate the lowest demand (Alburtis-Slatedale) and increased the demand of (Alburtis-Slatedale) to that of (Allentown-Wescosvill). We refer to Table 13 in Appendix J for a summery of the average demand of each node in period 1 and period 2. Figure 5 illustrates the MFs' locations in period 1 (Figure 5a) and period 2 (Figure 5b ). We provide a summary of these results in Table 14 in Appendix J. We observe the following about the initial locations in period 1 (Figure 5a) Table 13 in Appendix J). (a) Initial locations of the MFs in period 1. (b) Locations of the MFs in period 2. Optimal MF locations in period 1 and period 2 (Lehigh 2 instance). Color code: the red square is MF, and the black circle is a demand node/city. We make the following observations from the results in period 2 presented in Figure 5b . First, it is clear that all MFs moved from their initial locations to other locations in period 2 to accommodate the change in the demand. Second, all models scheduled one MF at Alburtis, where the average demand increased from 9 to 60 (average demand of Allentown in period 1). This makes sense because, in period 2, Alburtis generates the highest demand. Third, the DRO models scheduled one MF at Trexlertown and one at New Tripoli, where the average demand increased from 8 and 3 to 43 (average demand of Emmaus in period 1) and 23 (average demand of Catasauqua in period 1), respectively. The SP and W-DRO models scheduled one MF at Cetronia, where the demand increased from 8 to 60. In this section, we compare the operational performance of the optimal solutions to Instance 3, Lehigh 1, and Lehigh 2 via out-of-sample simulation. First, we fix the optimal first-stage decisions yielded by each model in the second-stage of the SP. Then, we solve the second-stage problem in (4) with the fixed first-stage decisions and the following sets of N = 10, 000 out-of-sample data of W n i,t , for all i ∈ I, t ∈ T, and n ∈ [N ], to compute the corresponding out-of-sample second-stage cost. Set 1. Perfect distributional information. We use the same settings and distribution (LogN) that we use for generating the N data in the optimization to generate N data. This is to simulate the performance when the true distribution is the same as the one used in the optimization. Set 2. Misspecified distributional information. We follow the same out-of-sample simulation procedure described in Wang, Chen, and Liu (2020) and employed in Shehadeh and Tucker (2021) to generate the N = 10, 000 data. Specifically, we perturb the distribution of the demand by a parameter ∆ and use a parameterized uniform distribution U [(1 − ∆)W , (1 + ∆)W ] for which a higher value of ∆ corresponds to a higher variation level. We apply ∆ ∈ {0, 0.25, 0.5} with ∆ = 0 indicating that we only vary the demand distribution from LogN to Uniform. This is to simulate the performance when the true distribution is different from the one we used in the optimization. In addition, we generate N correlated data points with 0.2 and 0.6 correlation coefficients. For brevity, we next discuss simulation results for the solutions obtained with N = 10. We observe similar results for solutions obtained with N = 50 (see Appendix M for these results). In Figures 6, 7 , and 8, we present the normalized histograms of out-of-sample total costs (TC) and second-stage costs (2nd) for Instance 3 (with W W W ∈ [20, 60], N = 10), Instance 3 (with W W W ∈ [50, 100], N = 10), and Lehigh 1 (N = 10). We obtained similar results for Lehigh 2 (see Appendix M). We computed TC as TC=first-stage cost+out-of-sample second-stage cost. Let us first analyze simulation results under Set 1 (i.e., perfect distributional information case) presented in Figures 6a-6b , 7a-7b, and 8a-8b. The MAD-DRO model yields a higher TC on average and at upper quantiles than the W-DRO and SP models because it schedules more MFs and thus yields a higher fixed cost (i.e., cost of establishing the MF fleet). The W-DRO model yields a slightly higher TC than the SP model because it schedules more MFs (and thus yeild a higher fixed cost). However, the DRO models yield significantly lower second-stage (transportation and unmet demand) costs on average and at all quantiles than the SP model. In addition, the MAD-DRO model yields a lower second-stage cost than the W-DRO model on average and at all quantile, especially for Lehigh 1. Note that a lower second-stage cost indicates a better operational performance (i.e., lower shortage and transportation costs) and thus has a significant practical impact. These results suggest that there are benefits to using the DRO models even when we have perfect distributional information. We observe the following from simulation results under Set 2 (i.e., misspecified distributional information case) presented in Figures 6c-6h, 7c-7h , and 8c-8h. It is clear from these figures that Next, we investigate the value of distributional robustness from the perspective of out-of-sample disappointment, which measures the extent to which the out-of-sample cost exceeds the model's optimal value (Van Parys, Kuhn 2021, Wang, Chen, and Liu 2020) . We define OPT and TC as the model's optimal value and the out-of-sample objective value, respectively. That is, OPT and TC can be considered as the estimated and actual costs of implementing the model's optimal solutions in practice, respectively. Using this notation, we define the out-ofsample disappointment as in Wang, Chen, and Liu (2020) as follows. A disappointment of zero indicates that the model's optimal value is equal to or larger than the outof-sample (actual) cost (i.e., TC≤OPT). This, in turn, indicates that the model is more conservative and avoids underestimating costs. In contrast, a larger disappointment implies a higher level of over-optimism because, in this case, the actual cost (TC) of implementing the optimal solution of a model is larger than the estimated cost (OPT). In addition, it is clear that the average and upper quantiles of the disappointments of the SP model are relatively very large (e.g., exceeding 100% for Lehigh 1). Finally, we observe that the out-of-sample disappointment of the MAD-DRO model is more stable than the W-DRO and SP models with a smaller standard deviation. We remark that these observations are consistent for the other considered instances, and the results with ∆ = 0.5 are similar to those with ∆ = 0.25. This demonstrates that the DRO model provides a more robust estimate of the actual cost that we will incur in practice. The results in this section demonstrate that the DRO approaches are effective in an environment where the distribution is hard to estimate (ambiguous), quickly changes, or when there is a small data set on demand variability. Moreover, these results emphasize the value of modeling uncertainty and distributional ambiguity. In this section, we study the sensitivity of DRO models to different parameter settings. Given (a higher volume of the demand). Figures 10 and 11 a higher number of MFs, especially when C is tight. As such, the MAD-DRO model often has a slightly higher total cost (due to the higher fixed cost of establishing a larger fleet) and better second-stage cost, i.e., better operational performance (see Figures 25-27) . For example, consider Instance 1. When f = 6, 000 and C = 50, the W-DRO and MAD-DRO models schedule 8 and 10 MFs, respectively. The associated (total, second-stage) costs of these solutions are respectively (74, 495, 26, 495) and ( In this section, we analyze the optimal solutions of the mean-CVaR-based models under some critical problem parameters. Specifically, we solve the models with f ∈{1,500, 6,000, 10,000}, γ ∈ Next, we compare the out-of-sample operational performance (i.e., second-stage cost) and disappointment of the risk-neutral (i.e., expectation models) and risk-averse models (i.e., mean-CVaRbased models with Θ = 0, or equivalently, risk-averse models with CVaR criterion). We use MAD-E (W-E) to denote the risk-neutral DRO model with MAD (1-Wasserstein) ambiguity presented in Section 5.1 (Section 5.2). In addition, we use SP-E to denote the risk-neutral SP model. In Table 10 Comparison Our results in this section demonstrate that the distributionally robust CVaR models tend to hedge against uncertainty, ambiguity, and risk by scheduling more MFs. Our results also indicate that the proposed DRO expectation models may be risk-averse under some parameter settings (e.g., high unmet demand penalty and low cost). In this paper, we propose two DRO models for the MFRSP. Specifically, given a set of MFs, a planning horizon, and a service region, our models aim to find the number of MFs to use within the planning horizon and a route and schedule for each MF in the fleet. The objective is to minimize the fixed cost of establishing the MF fleet plus a risk measure (expectation or mean-CVaR) of the operational cost over all demand distributions defined by an ambiguity set. In the first model (MAD-DRO), we use an ambiguity set based on the demand's mean, support, and mean absolute deviation. In the second model (W-DRO), we use an ambiguity set that incorporates all distributions within a 1-Wasserstein distance from a reference distribution. To solve the proposed DRO models, we propose a decomposition-based algorithm. We also derive lower bound inequalities and two families of symmetry breaking constraints to improve the solvability of the proposed models. Our computational results demonstrate (1) We suggest the following areas for future research. First, we aim to extend our models to optimize the capacity and size of the MF fleet. Second, we want to extend our approach by incorporating multi-modal probability distributions and more complex relationships between random parameters (e.g., correlation). Third, we aim to extend our approach to more comprehensive MF planning models, which consider all relevant organizational and technical constraints and various sources of uncertainties (e.g., travel time) with a particular focus on real-life healthcare settings. Although conceptually and theoretically advanced, stochastic optimization approaches such as SP and DRO are not intuitive or transparent to decision-makers who often do not have optimization expertise. Thus, future efforts should also focus on closing the gap between theory and practice. Appendix A: Derivation of feasible region X in (2) In this Appendix, we provide additional details on the derivation of the constraints defining the feasible region X of variables (x x x,y y y). As described in Lei, Lin, and Miao (2014) , we can enforce the requirement that an MF can only be in service when it is stationary using the following constraints: x t j,m + x t j ,m ≤ 1, ∀t, m, j, j = j , t ∈ {t, . . . , min{t + t j,j , T }}. If x t j,m = 1 (i.e., MF m is stationary at some location j in period t), it can only be available at location j = j after a certain period, depending on the time it takes to travel from location j to location j . It follows by (27) that x t j ,m = 0 for all j = j , t ∈ {t, . . . , min{t + t j,j , T }}. As pointed out by Lei, Lin, and Miao (2014) , this indicates that an earlier decision of deploying an MF at one candidate location would directly affect future decisions both temporally and spatially. In fact, this correlation is a major source of complexity for optimizing the MFRSP. Since the MF has to be in an active condition before providing service, we have to include the following constraints: x t j,m ≤ y m , ∀j, m, t. It is straightforward to verify that constraint sets (27) and (28) can be combined into the following compact form: Appendix B: Proof of Proposition 1 Proof. For a fixed x ∈ X , we can explicitly write the inner problem sup[·] in (6) as the following functional linear optimization problem. Letting ρ i,t , ψ i,t and θ be the dual variables associated with constraints (30b), (30c), (30d), respectively, we present problem (30) (problem (9) in the main manuscript) in its dual form: min ρ,θ,ψ≥0 ρ,θ,ψ≥0 ρ,θ,ψ≥0 t∈T i∈I where ρ ρ ρ and θ are unrestricted in sign, ψ ψ ψ ≥ 0, and constraint (31b) is associated with the primal variable P. Note that for fixed (ρ, ψ, ρ, ψ, ρ, ψ, θ) , constraint (31b) is equivalent to Since we are minimizing θ in (31), the dual formulation of (30) is equivalent to: min ρ ρ ρ,ψ ψ ψ≥0 t∈T i∈I Appendix C: Proof of Proposition 2 First, note that the feasible region Ω = {(8b) − (8d), (10a) − (10b), (11c)} and S are both independent of x x x, ρ ρ ρ, and ψ ψ ψ and bounded. In addition, the MFRSP has a complete recourse (i.e., the recourse problem is feasible for any feasible (x x x,y y y) ∈ X ). Therefore, max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I Second, for any fixed π, v, W, k π, v, W, k π, v, W, k, t∈T i∈I is a linear function of x x x, ρ ρ ρ, and ψ ψ ψ. It follows that max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I is the maximum of linear functions of x x x, ρ ρ ρ, and ψ ψ ψ, and hence convex and piecewise linear. Finally, it is easy to see that each linear piece of this function is associated with one distinct extreme point of Ω and S. Given that each of these polyhedra has a finite number of extreme points, the number of pieces of this function is finite. This completes the proof. The results of Long, Qi, and Zhang (2021) indicate that if the second-stage optimal value is supermodular in the realization of uncertainties under the MAD ambiguity set, the worst-case is a distribution supported on (2n + 1) points, where n is the dimension of the random vector. In this Appendix, we show that even if our recourse is supermodular in demand realization, the number of points in the worst-case distribution of the demand is large, which renders our two-stage MAD-DRO model computationally challenging to solve using Long, Qi, and Zhang (2021) ' approach Recall that the demand is indexed by i and t, i.e., W i,t , for all i ∈ I and t ∈ T . Thus, the dimension of our random vector is |I||T |. Accordingly, assuming that the second-stage optimal value is supermodular, then the resullts of Long, Qi, and Zhang (2021) suggest that the worst-case distribution in MAD-DRO has (2|I||T |+1) points or scenarios. Note that |I| and |T | and thus 2|I||T | is large for most instances of our problem (See Example 1-2 below). Since our computational results and prior literature indicate that solving the scenario-based model using a small set of scenarios is challenging, solving a reformulation of MAD-DRO using the (2|I||T |+1) points is expected to be computationally challenging. Example 1. Instance 6 (I = 20 and T = 20). The worst-case distribution has 801 points. Example 3. Instance 10 (I = 30 and T = 20). The worst-case distribution has 1201 points. Recall thatP N = 1 N n n=1 δŴ W W n . The definition of Wasserstein distance indicates that there exist a joint distribution Π of (W W W ,Ŵ W W ) such that E Π [||W W W −Ŵ W W ||] ≤ . In other words, for any P ∈ P(S), we can rewrite any joint distribution Π ∈ P(P,P N ) by the conditional distribution of W W W givenŴ W W =Ŵ W W n for n = 1, . . . , N , denoted as F n . That is, Π = 1 N N n=1 F n × δξ n . Notice that if we find one joint distribution Π ∈ P(P,P N ) such that ||W W W −Ŵ W W || dΠ ≤ , then dist(P,P N ) ≤ . Hence, we can drop the infimum operator in Wasserstein distance and arrive at the following equivalent problem. Using a standard strong duality argument and letting ρ ≥ 0 be the dual multiplier, we can reformulate problem (32) by its dual, i.e. Appendix F: DRO with mean-CVaR as a Risk Measure Both the MAD-DRO and W-DRO model presented in Section 5 assume that the decision-maker is riskneutral (i.e., adopt the expected value of the recourse as a risk measure). In some applications of the MFRSP, however, decision-makers might be risk-averse. Therefore, as one of our reviewers suggested, in this section, we present a distributionally robust risk-averse model for the MFRSP. To model the decision maker's risk aversion, most studies adopt the CVaR, i.e., set (·) = CVaR κ (·), where κ ∈ (0, 1). CVaR is the conditional expectation of Q(·) above the value-at-risk VaR (informally, VaR is the κ quantile of the distribution of Q(·), see Paç and Pınar (2014) , Rockafellar and Uryasev (2002) , Sarykalin, Serraino, and Uryasev (2008) , Van Parys et al. (2015)). CVaR is a popular coherent risk measure widely used to avoid solutions influenced by a bad scenario with a low probability. However, as pointed out by Wang et al. (2021) and a reviwer of this paper, neither expected value nor CVaR can capture the variability of uncertainty in a comprehensive manner. Alternatively, we consider minimizing the mean-CVaR, which balances the cost on average and avoids high-risk levels. As pointed out by Lim, Shanthikumar, and Vahn (2011) , Wang et al. (2021) , and our reviewer, the traditional CVaR criterion is sensitive to the misspecification of the underlying loss distribution and lacks robustness. Therefore, we propose a distributionally robust mean-CVaR model to remedy such fragility, reflecting both risk-averse and ambiguity-averse attitudes. For brevity, we use the MAD ambiguity set to formulate and analyze this model because similar formulation and reformulation steps can be used to derive a solvable mean-CVaR-based model based on the 1-Wasserstein ambiguity. Let us now introduce our distributionally robust mean-CVaR-based model with MAD-ambiguity (MAD-CVaR). First, following Rockafellar, Uryasev et al. (2000) , Rockafellar and Uryasev (2002) , and Van Parys et al. (2015), we formally define CVaR as where [c] + := max{c, 0} for c ∈ R. Parameter κ measures a wide range of risk preferences, where κ = 0 corresponds to the risk-neutral formulation. In contrast, when κ → 1, the decision-makers become more riskaverse. Using (34), we formulate the following MAD-CVaR model (see, e.g., Wang et al. (2021) for a recent application in facility location): where 0 ≤ Θ ≤ 1 is the risk-aversion coefficient, which represents a trade-off between the risk-neutral (i.e., ) and risk-averse (i.e., CVaR(·)) objectives. A larger Θ implies less aversion to risk, and vice verse. In extreme cases, when Θ = 1, the decision maker is risk-neutral and (35) reduces to the MAD-DRO expectation model in (6). When Θ = 0, the decision maker is risk and ambiguity averse. Next, we derive a solvable reformulation of (35). Let us first consider the inner maximization problem sup P∈F(S,µ µ µ,η η η) CVaR(Q(x x x,W W W )) in (35). It is easy to verify that sup P∈F(S,µ µ µ,η η η) CVaR(Q(x x x,W W W )) = sup P∈F(S,µ µ µ,η η η) Interchanging the order of sup P∈F(S,µ µ µ,η η η) and inf ζ∈R follows from Sion's minimax theorem (Sion 1958) because is convex in ζ and concave in P. Next, we apply the same techniques in Section 5.1.1 and Appendix B to reformulate the inner maximization problem sup P∈F(S,µ µ µ,η η η) (36c) as a minimization problem and combine it with the outer minimization problem to obtain inf ζ,ρ ρ ρ,θ,ψ ψ ψ≥0 t∈T i∈I where (37b) and (37c) follows from the definition of [·] + . Accordingly, problem (36) (equivalently last term in formulation in (35)) is equivalent to inf ζ,ρ ρ ρ,θ,ψ ψ ψ≥0 Since we are minimizing ζ in (38a), we can equivalently re-write (38a) as inf ρ ρ ρ,θ,ψ ψ ψ≥0 where (39) follows from the derivation of h(x x x,ρ ρ ρ,ψ ψ ψ) ≡ max Section 5.1.1. Accordingly, problem (38) is equivalent to inf ρ ρ ρ,θ,ψ ψ ψ≥0 Next, derive equivalent linear constraints of the embedded minimization problem in constraint (40b). For fixed ρ ρ ρ and ψ ψ ψ, we re-write constraint (40b) as Letting a i,t , b i,t , g i,t , and o i,t be the dual variables associated with constraints (41b), (41c), (41d), and (41e), respectively, we present the linear program in (41a)-(41e) in its dual form as Replacing (40b) in (40) with (42a)-(42d), we derive the following equivalent reformulation of problem (40) (equivalently, problem sup P∈F(S,µ µ µ,η η η) CVaR(Q(x x x,W W W )) in (35)): Combining the inner problem sup P∈F(S,µ µ µ,η η η) CVaR(Q(x x x,W W W )) in the form of (43) and problem sup P∈F(S,µ µ µ,η η η) in the form of (12) with the outer minimization problem in (35), we derive the following equivalent reformulation of the MAD-CVaR model in (35). s.t. (x x x,y y y) ∈ X , ψ ψ ψ ≥ 0, (42a) − (42c). It is easy to verify that problem (44) is equivalent to where h(x x x,ρ ρ ρ,ψ ψ ψ) = max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I 8d), (10a) − (10b), (11c) from Section 5.1.1. Finally, we observe that the right-hand side (RHS) of constraints (45c) is equivalent to the RHS of constraints (12c) in the equivalent reformulation of the risk-neutral MAD-DRO model in (12). Therefore, we can easily adapt Algorithm 1 to solve (45) (see Algorithm 2). Algorithm 2: Decomposition algorithm for the MAD-CVaR Model. and record an optimal solution (x x x * ,ρ ρ ρ * ,ψ ψ ψ * , δ * ) and optimal value Z * . Set LB = Z * . 3. Sub-problem. 3.1. With (x x x,ρ ρ ρ,ψ ψ ψ) fixed to (x x x * ,ρ ρ ρ * ,ψ ψ ψ * ), solve the following problem h(x x x,ρ ρ ρ,ψ ψ ψ) = max λ,v,W,π,k λ,v,W,π,k λ,v,W,π,k t∈T i∈I and record optimal solution (π * , λ * , W * , v * , k * π * , λ * , W * , v * , k * π * , λ * , W * , v * , k * ) and h(x x x,ρ ρ ρ,ψ ψ ψ) * . then stop and return x x x * and y y y * as the optimal solution to problem (46) (equivalently, (45)). else add the cut δ ≥ t∈T i∈I The sample average deterministic equivalent of (48) based on N scenarios, W W W 1 , . . . ,W W W N , is as follows: where for each n ∈ [N ], Q(x x x,W W W n ) is the recourse problem defined in (4), and Θ ∈ [0, 1]/ Appendix J: Details of Lehigh County Instances the radius should not be too small. Otherwise, the problem may behave like sample average approximation and hence, losing the purpose of robustification. In particular, if we set the radius to zero, the ambiguity set shrinks to a singleton that contains only the nominal distribution, in which case the DRO problem reduces to an ambiguity-free SP (Mohajerin Esfahani and Kuhn 2018) . But, on the other hand, the radius should not be too large to avoid conservative solutions, which is one of the major criticism faced by traditional RO methods. Given that the true distribution P is possibly unknown, it is impossible to compute that minimizesÊ[Q(x x x( , N ),W W W )]. Thus, as detailed in Mohajerin Esfahani and Kuhn (2018) , the best we can hope for is to approximate opt using the training data set. As pointed out by Mohajerin Esfahani and Kuhn (2018) and Gao (2020) , practically, the radius is often selected via cross-validation. We employ the following widely used cross-validation method to estimate opt as in Jiang, Ryu, and Xu (2019) and Mohajerin Esfahani and Kuhn (2018) . First, for each N ∈ {10, 50, 100} and each ∈ {0.01, 0.02, . . . , 0.09, 0.1, . . . , 0.9, 1, . . . , 10} (i.e., a log-scale interval as in Mohajerin Esfahani and Kuhn (2018), Jiang, Shen, and Zhang (2017) , Tsang and Shehadeh (2021) ), we randomly partition the data into a training (N ) and testing set (N ). Using the training set, we solve W-DRO to obtain the optimal first-stage solution x x x( , N ) for each and N . Then, we use the testing data to evaluate these solutions by computingÊ P N [Q(x x x( , N ),W W W )] (where P N is the empirical distribution based on the testing data N ) via sample average approximation. That is, we solve the second-stage with x x x fixed to x x x( , N ) and N data and compute the corresponding second-stage costÊ P N [Q(x x x( , N ),W W W )]. Finally, we set best N to any that minimizesÊ P N [Q(x x x( , N ),W W W )]. We repeat this procedure 30 times for each N and set to the average of the best N across these 30 replications. We found that best N equals (7, 5, 2) when N =(10, 50, 100) for most instances. It is expected that decreases with N (see, e.g., Mohajerin Esfahani and Kuhn (2018) , Jiang, Ryu, and Xu (2019) , Tsang and Shehadeh (2021) ). Intuitively, a small sample does not have sufficient distributional information, and thus a larger produces distributionally robust solutions that better hedge against ambiguity. In contrast, with a larger sample, we may have more information from the data, and thus we can make a less conservative decision using a smaller value. While Wasserstein ambiguity sets offer powerful out-of-sample performance guarantees and enable practitioners to control the model's conservativeness by choosing , moment-based ambiguity sets often display better tractability properties. In fact, various studies provided evidence that DRO models with moment ambiguity sets are more tractable than the corresponding SP because the intractable high-dimensional integrals in the objective function are replaced with tractable (generalized) moment problems (see, e.g., Mohajerin Esfahani and Kuhn (2018), Delage and Ye (2010) , Goh and Sim (2010) , Wiesemann, Kuhn, and Sim (2014) ). In contrast, DRO models with Wasserstein ambiguity sets tend to be more computationally challenging than some moment-based DRO model and their SP counterparts, especially when N is large. In this paper, we obtained similar observations. For a detailed discussion we refer to Mohajerin Esfahani and Kuhn (2018) and references therein. A survey of healthcare facility location The multi-period incremental service facility location problem A location model for urban hierarchy planning with population dynamics Distributionally robust facility location problem under decision-dependent stochastic demand Deriving robust counterparts of nonlinear uncertain inequalities More bounds on the expectation of a convex function of a random variable Approximation of expected returns and optimal decisions under uncertainty using mean and mean absolute deviation Locating service facilities whose reliability is distance dependent The price of robustness Use of an innovative design mobile hospital in the medical response to Hurricane Katrina Mobile health is worth it! economic benefit and impact on health of a population-based mobile screening program in New Mexico Meet cate Pennsylvania's first mobile vaccination unit A robust learning approach for regression models based on distributionally robust optimization Robust stochastic optimization made easy with rsome The dynamic uncapacitated hub location problem The maximum covering/shortest path problem: A multiobjective network design and routing formulation The value of randomized solutions in mixed-integer distributionally robust optimization problems Distributionally robust optimization under moment uncertainty with application to data-driven problems A review of competitive facility location in the plane Facility location when demand is time dependent Mobile health units in emergency operations: A methodological approach (Humanitarian Practice Network Distributionally robust two-stage stochastic programming The multi-vehicle cumulative covering tour problem 2020 Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality Distributionally robust stochastic optimization with Wasserstein distance The covering tour problem Which households are most distant from health centers in rural China? Evidence from a GIS network analysis Distributionally robust optimization and its tractable approximations Heuristics for the multi-vehicle covering tour problem The mobile facility routing problem Conic programming reformulations of two-stage distributionally robust linear programs over Wasserstein balls Dynamic facility location with generalized modular capacities Lagrangian heuristics for large-scale dynamic facility location with generalized modular capacities Data-driven chance constrained stochastic program Data-driven distributionally robust appointment scheduling over Wasserstein balls Integer programming approaches for appointment scheduling with random no-shows and service durations A multicut L-shaped based algorithm to solve a stochastic programming model for the mobile facility routing and scheduling problem A two-stage robust optimization approach for the mobile facility fleet sizing and routing problem under uncertainty Mobile clinic market to grow six-fold over 10 years Conditional value-at-risk in portfolio optimization: Coherent but fragile Distributionally robust optimization with decision dependent ambiguity sets Models and algorithms for distributionally robust least squares problems Data-driven distributionally robust polynomial optimization Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations Calculating the return on investment of mobile healthcare Robust portfolio choice with CVaR and VaR under distribution and mean return ambiguity Robust optimization with ambiguous stochastic constraints under mean and dispersion information Distributionally robust optimization: A review The Law of Retail Gravitation Conditional value-at-risk for general loss distributions Optimization of conditional value-at-risk Data-driven distributionally robust capacitated facility location problem Value-at-risk vs. conditional value-at-risk in risk management and optimization. State-of-the-art decision-making tools in the information-intensive age Analysis of models for the stochastic outpatient procedure scheduling problem Distributionally robust facility location with bimodal random demand A distributionally robust optimization approach for location and inventory prepositioning of disaster relief supplies On general minimax theorems The optimizer's curse: Skepticism and postdecision surprise in decision analysis Mobile clinic in massachusetts associated with cost savings from lowering blood pressure and emergency department use Convex programming with set-inclusive constraints and applications to inexact linear programming Robust optimization of a broad class of heterogeneous vehicle routing problems under demand uncertainty The bi-objective stochastic covering tour problem Stochastic optimization models for a home service routing and appointment scheduling problem with random travel and service times Distributionally robust control of constrained stochastic systems From data to decisions: Distributionally robust optimization is optimal A dual-based procedure for dynamic facility location What causes post-decision disappointment? Estimating the contributions of systematic and selection biases Distributionally robust hub location Two-stage distributionally robust programming based on worst-case mean-cvar criterion and application to disaster relief management A distributionally robust optimization approach for surgery block allocation Distributionally robust convex optimization Lehigh County, Pennsylvania -Wikipedia, the free encyclopedia An approximation algorithm for the two-stage distributionally robust facility location problem. Advances in Global Optimization Ambiguous chance-constrained binary programs under mean-covariance information We want to thank all colleagues who have contributed significantly to the related literature. We are grateful to the anonymous reviewers for their insightful comments and suggestions that allowed us to improve the paper. Special thanks to Mr. Man Yiu Tsang (a Ph.D. student at the Department of Industrial and Systems Engineering, Lehigh University) for helping with Figure 2 and proofreading the paper. Dr. Karmel S. Shehadeh dedicates her effort in this paper to every little dreamer in the whole world who has a dream so big and so exciting. Believe in your dreams and do whatever it takes to achieve them-the best is yet to come for you. Recall from the definition of the support set that the lowest demand of each customer i in period t equals to the integer parameter W i,t . Now, if we treat the MFs as uncapacitated facilities, then we can fully satisfy W i,t at the lowest assignment cost from the nearest location j ∈ J , where J := {j : x t j,m = 1}. Note that J ⊆ J. Thus, the lowest assignment cost must be at least equal to or larger than i∈I minIf γ < min j∈J {βd i,j }, then the recourse must be at least equal to or larger than i∈I γW i,t . Accordingly, i∈I min{γ, minAppendix I: Sample Average ApproximationAppendix L: Additional CPU Time Results