key: cord-0623379-mepmmtjl authors: Yang, Yue; Bao, Wencang; Ramezani, Mohsen; Xu, Zhe title: Real-time and Large-scale Fleet Allocation of Autonomous Taxis: A Case Study in New York Manhattan Island date: 2020-09-06 journal: nan DOI: nan sha: 236320ffc82960e12778b42eb6a3f799634246a7 doc_id: 623379 cord_uid: mepmmtjl Nowadays, autonomous taxis become a highly promising transportation mode, which helps relieve traffic congestion and avoid road accidents. However, it hinders the wide implementation of this service that traditional models fail to efficiently allocate the available fleet to deal with the imbalance of supply (autonomous taxis) and demand (trips), the poor cooperation of taxis, hardly satisfied resource constraints, and on-line platform's requirements. To figure out such urgent problems from a global and more farsighted view, we employ a Constrained Multi-agent Markov Decision Processes (CMMDP) to model fleet allocation decisions, which can be easily split into sub-problems formulated as a 'Dynamic assignment problem' combining both immediate rewards and future gains. We also leverage a Column Generation algorithm to guarantee the efficiency and optimality in a large scale. Through extensive experiments, the proposed approach not only achieves remarkable improvements over the state-of-the-art benchmarks in terms of the individual's efficiency (arriving at 12.40%, 6.54% rise of income and utilization, respectively) and the platform's profit (reaching 4.59% promotion) but also reveals a time-varying fleet adjustment policy to minimize the operation cost of the platform. Autonomous vehicle industry is going through a rapid development right now. GM CRUISE, Tesla, Waymo and Daimler have tested their autonomous taxis on road, and more than ten global automakers have launched or planned programs regarding self-driving cars. Autonomous taxi, as a result, becomes an emerging topic in urban transportation. On one hand, it alleviates the burden of city congestion and increases road capacity (Greenblatt and Saxena 2015) ; on the other hand, it is also capable of reducing traffic accidents by preventing human errors. Furthermore, as a means of public transportation, the driver-less taxi is able to effectively hinder the spread of the COVID-19 for passengers, which makes it more attractive in such a pandemic crisis. Plenty of predictions and simulations showed the high possibility that the self-driving taxi could come into reality in the foreseeable future (Litman 2014) . Equipped with sensing and communication tools, each self-driving taxi can periodically upload real-time positions Copyright c 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. and status (i.e. vacant or occupied) to the platform. In turn, during each time slot(i.e. one minute), the platform will collect the arriving trips and make assignments. Under this circumstance, developing optimal control algorithms plays an important role for these self-driving platforms to balance the demands and supplies. One representative is order dispatching, where various methods (Lowalekar, Varakantham, and Jaillet 2018; Xu et al. 2018 ) have been widely studied to minimize the supply-demand mismatch, reduce the passengers' waiting time, lower the fuel cost; another key application is redistributing available vehicles ahead of time to ameliorate the differences between demand and supply (Nair and Miller-Hooks 2011) . If available taxis are precisely directed to locations with high demand, it will consequently improve the order fulfillment rate, and thus benefit for all aspects of the system: increasing the utilization of mobility, promoting the efficiency of each taxi and boosting the Gross Merchandise Volume (GMV) of the platform. As an example illustrated in Figure 1 , five vacant taxis of current time step (A) will be guided to the potential pick-up locations to fulfill the demands in the next step (B). Even though abundant historical demands and digital trajectories can be collected by some mobility on-demand platforms (i.e. Uber, Lyft and DiDi), how to allocate their available fleets is not an easy task. One major issue is allocation policies at the immediate step will exert significant influence on the spatio-temporal distribution of future supply-demand, where a long-term optimization strategy is needed. Besides, the demand dramatically changes with hours in a day, so it is crucial to determine how many vehicles should be actually invested in advance. Resource restriction, always ignored by traditional models, also affects agents' decisions significantly. Finally, allocating thousands of autonomous taxis of one round may pose a great challenge to the computational efficiency and optimality of optimization approaches. To address these challenges, we demonstrate a novel Constrained Multi-agent Markov Decision Process (CMMDP) model, and design a Column Generation algorithm to accelerate the solving speed and retain tractability. The extensive experiments on simulation system further shows the proposed algorithm remarkably improves the individual's efficiency and the platform's profit. The highlights of this work are summarized as follows: • We formulate the large-scale fleet allocation problem in a CMMDP setting, which considers the long-term payoffs and the immediate resource constraints simultaneously. • To accommodate the real-time requirements, the CM-MDP can be further split into dynamic assignment subproblems, and each decision is endowed with a combinatorial function of instant and learned future rewards. • Facing the complexity of large-scale allocation problem, we employ the Column Generation algorithm to achieve a computational efficiency and optimality, which shows a great potential to be deployed in industrial application. • Last but not least, the proposed model significantly outperforms the state-of-the-art fleet allocation methods with regard to individual's efficiency and platform's profit. The remainder of the paper is structured as follows. Section 2 reviews existing ideas related to fleet allocation and CMMDP. In section 3 We define important concepts and a real-time framework of fleet allocation. Section 4 introduces the solving approach to our framework. We evaluate the performance of the proposed approach in section 5 and close this paper with some conclusive insights in Section 6. This section is an overview of related works in general and specific levels. In the general level, we summarize some approaches for the resource allocation problem, the generalized version of fleet allocation. In the specific level, we focus on one methodology, CMMDP, and analyze its variants. Autonomous taxi fleet allocation plan essentially is a resource allocation problem, that is, a decision of available resources to demands with time, cost and other limitations. In taxi allocation plan, a traditional way is generating recommendations of immediate next movements for each driver through models with fixed structures. Further details are discussed from the perspectives of route recommendation and taxis competing (Qian et al. 2015) , increasing profitability by reducing cruising miles (Powell et al. 2011) , analysis high-revenue location factors (Hwang, Hsueh, and Chen 2015) . Nevertheless, the long-run global-optimal solution is easily missed in the aforementioned studies because of uncertainties, especially in large-scale instances. Recently some network-based models have emerged to solve that problem, such as MDP and Reinforcement Learning (RL). These models make real-time decisions under risks of uncertain features, and their flexible structure allows MDP and RL models to be effortlessly extended. David J. Musliner analyzes the interactions when MDP is extended to a multi-agent model. Robbel, Oliehoek, and Kochenderfer introduce a property called "anonymous influence" to measure the benefits of computational efficiency for variable elimination. Xu et al. formulate customer-taxi matching as a sequential decision-making problem to optimize instant and long-term rewards. Mao, Liu, and Shen estimate policy and value functions in a model-free deep RL framework. Still, it hasn't been solved efficiently that how to relieve the computational burden in hyper-scale cases. Constrained multi-agent Markov Decision Process refers to the multi-agent MDP with resource constraints which always affect agents' decisions in practical use. Beynier and Mouaddib assess the interdependence of agents under resource and temporal constraints. Matignon, Jeanpierre, and Mouaddib utilize a sequential decision making of multirobot collaboration under communication constraints. Erwin Walraven consider road capacity constraints in taxi repositioning plan. Different from analyzing expectation, Nijs et al. measure the probability of resource constraint violation. Fowler et al. also discuss the performance of MDP with action, reward and probabilistic satisfaction constraints. The main challenge in CMMDP is how to lower the computational complexity. One idea is to decompose multi-agent decisions making to individuals'. Beynier and Mouaddib allow the independence of decision making for each agent and consider the lost value stimulated by local decision on other agents. Another idea, which is also applied in our paper, harness column generation algorithm. Instead of solving individual agent's policy or action per time, the column generation outputs the solutions for all agents at a time. Erwin Walraven illustrate the column generation algorithm for CMMDP. De Nijs and Stuckey combine column generation and CMMDP to investigate the balance of efficiency and safety in policy relaxation. However, few of CMMDP methods in fleet allocation field precisely contemplate real-time decision making which is of high importance for allocation efficiency, profit and customer satisfaction. In light of the fact, CMMDP is divided into subproblems to get a better and quicker real-time allocation in our model. To model sequential decision making under uncertainty, we employ a Markov Decision Process (MDP) (Bellman 1957) in this section. Specifically, each autonomous taxi is defined as an agent with a spatial action set (movements); the whole operating area is formulated as a network G = (S, E), where S denotes the set of location s and E is the set of edge e(l, l ) connecting two locations s and s . After that, we address the resource limitation problem in multi-agent environments (Boutilier 1996) by CMMDP, in which the resource constraints are the maximum location capacity. To meet the on-line execution requirements, we spilt CMMDP into multiple sub-problems combining both immediate rewards and future gains. As a stepping stone, a multi-agent MDP consisting of N independent agents can be defined through the single agent formulation S, A, P, R, T where only one agent interacts with the environment: Time horizon T, operating time steps set {1, ..., T }. State space S, a finite set of local states, and each state s, ∀s ∈ S is the current location of the agent. Action subspace A(s), candidate action set for agents at location s. Here, we develop a searching scheme for computing an approximately feasible actions space for s: where τ (s, s ) denotes the shortest travel time from s to s , which can be solved directly through Dijkstra's algorithm (Dijkstra et al. 1959 ) in the network G; δ is the travel time threshold limiting the searching space of candidates. For a better understanding, we set an example in figure 2 to search for the actions of s 0 = l 0 , consequently, the action space A(s 0 ) will be set to {l 0 , l 1 , l 2 , l 3 } derived from our scheme. Global action space A, the joint space for all A(s), ∀s ∈ S. Matching probability, p m (t, s), estimates the probability that the vacant taxi is matched to a trip. According to other scholars' work (Buchholz 2018; Yu et al. 2019) , the matching probability from a taxi's view is: where n trip (t, s), n taxi (t, s) denote the number of trips and taxis in s = l of step t, respectively. In addition, we will introduce a parameter θ ∈ (0, 1] to describe how matching probability changes with demand-supply ratio ntrip(t,s) ntaxi(t,s) . Destination probability, p d (t, s, s ), measures the likelihood of the destination of the trip being the state s when the passenger is picked up at state s of step t. This parameter can be estimated from the abundant historical trip data: where n dest (t, s, s ) represents the number of trips starting from s to s at step t. State transition probability P (t, s, a, s ) is the transition probability model that state s will be reached when action a is taken in state s of step t. Suppose there is a taxi locating at state s = l of step t and it takes action a = l to state l , then: Figure 2: This is an example of state transition process, a taxi started from s 0 = l 0 at step t = t 0 and took action a = l 1 , and he arrived at s 1 = l 1 at t 0 + τ (l 0 , l 1 ). Luckily, he found a trip heading for l 4 , consequently, the taxi would arrive at s 4 = l 4 at step t 0 + τ (l 0 , l 1 ) + τ (l 1 , l 4 ). • If it successfully finds a trip from s = l to l , it will directly transfer to state s = l , then, the transition probability P (t, s, a, s ) can be defined as: • Unfortunately, if it fails to find a trip, it will continue repositioning from s = l , here the transition probability can be formulated bellow: To illustrate the state transition process, we present a transition scenario of s 0 → s 4 in Figure 2 . In this case, the transition probability P (t 0 , s 0 , l 1 , s 4 ) can be obtained as p m (t 0 + τ (l 0 , l 1 ), s 1 ) · p d (t 0 + τ (l 0 , l 1 ), s 1 , s 4 ). In particular, we should note that if the taxi takes an action (i.e. l 2 in Figure 2 ) which is not directly linked to his current location in the network, it will take the shortest route composed of at least two locations (i.e. if taxi takes l 2 as an action, it means that it will take the shortest route l 0 → l 1 → l 2 in figure 2 ). Under this circumstance, the state transition process will be rather complicated since we need to take each passing location into consideration. For simplicity, the probable transitions of each intermediate location can be negligible because we assume there will be no trips assigned to the taxi until it arrives at the assigned destination, and we only discuss the state transitions happening in the final location. Immediate reward R(t, s, a), is a reward model that maps steps t, states s and actions a to rewards of the agent. Intuitively, the immediate reward can be set as the payment of a trip, by which the goal of the system is to maximize the Gross Merchandise Volume (GMV) of the platform. Optimal expected reward V * (t, s) is the maximal collective expected reward of the agent after T time steps. Hence, the optimal expected reward V * (t, s) starting from state s at step t: where Q(t, s, a), in the Bellman form (Bellman 1957) , is the expected reward that a taxi can earn by starting from s and specifying an action a at step t: Here, γ is discount factor, set as a constant slightly smaller than 1 to ensure the existence of finite optimal expected payoff. Given the optimal value V * (t, s), an optimal policy π * of a single-agent is to simply act greedily: π * (t, s) = argmax a∈A(s) {Q(t, s, a)} (8) Since the optimal values of V and Q at the terminate step t = T are known, we can therefore solve the optimal value of V * , Q π * and policy π * by dynamic programming approach (Bertsekas et al. 1995) in Algorithm 1. However, this will change when resource constraints and interactions among individuals are added to the problem. Input: State set S, Action set A, Transition Probabilities P , Immediate reward R, Terminate step T Output: Optimal value V * , Q π * and Optimal policy π * 1: Initialize a hash table V * to store optimal state values. Compute V * (t, s) and Q(t, s, a) based on Eq 6 7: Update the optimal policy π * (t, s) based on Eq 8 8: end for 9: end for 10: return V * , Q π * , π * Constrained Multi-agent Planning So far we discussed MDP from the angle of one individual agent without any constraints. However, the aforementioned single-agent model falls short of the competition among multiple agents or the resource constraint on each action. Take an example similar to our problem, the number of vehicles taking a particular road at any given time is limited by the capacity of the road, i.e., a 3-lane road segment will fit at most 3 cars side by side. Back to our setting, the number of taxis heading for a particular location at any given time is limited by the trip amounts generated in this location. To model such a problem, we consider a large class of decision making problems involving multiple independent agents with resource constraints, which imposes restrictions on the actions that the agents could execute. In the first place, We construct a Multi-agent MDP (MMDP) problem N, S, A, P, R, T by extending the single-agent framework to a set of N agents: Agent N, N = {1, ..., N } is a set of agents. State S, S = S 1 × S 2 × ... × S n is the factorised state space, which is the Cartesian product of n local factorised states S i per agent. Action A, A = A 1 × A 2 × ... × A n is the joint action space, which is the Cartesian product of the n local action spaces A i per agent. State transition function P, P = i P i is the transition function, which is the product of the independent transition functions P i of agent i. Immediate reward R is the total immediate reward: Resource constraints (Altman 1999) force the agents to coordinate their decisions, that is, the policies used by agents should stay below the resource limits. In our work, we achieve this by computing the maximum capacity accommodating taxis of each action(location) of each step. Given an estimated trip quantityn trip (t + τ (s, a), a) and a lowerbound matching probabilityp m ∈ [0, 1), which guarantees that taking action a at step t will not result in a worse matching probability than the lower-bound, the maximum capacity ∆ t,a of action a at step t can be derived from Eq. 2: whereθ represents the estimated parameter from historical data in Eq. 2. In the second place, we introduce a variable x(i, t, s, a) representing the probability that agent i at state s executes action a of step t. The Constrained Multi-agent Markov decision processes(CMMDP) formulation corresponds to the linear program (De Nijs 2019) is presented in Eq. 11, which maximizes the sum of expected rewards of all agents while ensuring that the expected resource consumption respects resource limitations. In particular, the flow of probability is made consistent with the transition function through the first and second constraints, and the resource usage restrictions are satisfied in expectation through the third constraint. The Linear Program(LP) in Eq. 11 computes a solution that maximizes the expected value of agents joint policy under the resource constraints, and they will require optimizing a single large centralized program. Nevertheless, the LP can quickly grow very large, having O(N · T · |S i | · |A i |) variables, which means that finding optimal policies is intractable in general. Also, CMMDP in Eq. 11 inevitably has a fundamental drawback: meeting the resource constraints only in expectation. To downscale the optimization variables and guarantee a strict adherence to the resource constraints, we further introduce a binary variable y(i, t, s, a) → {0, 1} , which indicates whether or not agent i takes a from s in t, to reformulate the CMMDP as a Mixed-Integer Linear Programs (MILP) formalism in the following section. From the perspective of the platform, real-time situations and on-line implementation should be considered in the optimization framework, which naturally falls into the category of 'Dynamic assignment problem' (Spivey and Powell 2004) . Therefore, the LP of Eq. 11 will be spilt into subproblems as a 'General assignment problem' (Fisher and Jaikumar 1981) in every time step t to downscale the variable size and adapt to the on-line implementation: In each time step t, the current state s i of each agent i will be collected and the goal of the online optimization algorithm is to determine the best matching between agents and candidate actions to maximize the sum of expected rewards Q π * i (t, s i , a). Specifically, the first constraint ensures the resource constraint ∆ t,a imposed on a given action a at step t. The second constraint guarantees that each agent will be assigned at most one action at a time. Note that Q π * i (t, s i , a) represents an expected reward in multi-agent system with each agent having its unique policy π i . It is still an open issue on how to tackle large-scale self-interest agents planning problems. However, since all autonomous taxis in our setting are entirely subject to the centralized platform, we can restrict all agents to have a common-interest in maximizing the global gain, resulting in them sharing the same policy π * of single-agent setting. Intuitively, Q π * (t, s i , a), a learned expected reward function, can be computed in Algorithm 1 based on historical data. In many previous researches on MDP-based methods (Yu et al. 2019; Shou et al. 2020) , the optimal policy derived from historical episodes is directly used to guide taxis in the current episode, which naturally misses and overlooks the particularities and characteristics of the real-time data. To capture more information in real-time planning, we design a modified reward value U (i, t, s i , a) combining both learned reward function Q π * (t, s i , a) and the real-time re-wardR(t, s i , a) as below: whereR(t, s i , a) is the instant reward estimated at step t, equivalent to average payment earned by each taxi from action a. Figure 3 illustrates the proposed procedure combining off-line learning step with on-line assignment step. Therefore, the objective of Eq. 12 can be further modified as: Solving Approaches for Large-scale Setting Although problem in Eq. 12 decouples the constraints and enables us to solve small-scale problems in a reasonable time (i.e. by employing available algorithms (Bang-Jensen and Gutin 2008) of minimum-cost flow problem), it becomes computationally expensive as the overwhelming number of variables are introduced. To solve larger-scale problems efficiently, we develop a column generation procedure that can be easily adapted to real-world cases. Conceptually, Column Generation is an effective technique for decomposing combinatorial optimization problems, such as stock cut problem (Gilmore and Gomory 1961) , graph coloring problem (Mehrotra and Trick 1996) , or the vehicle routing problem (Desrosiers, Soumis, and Desrochers 1984) . The basic framework is to reformulate the assignment problem as a Restricted master problem (RMP), then choose entering columns by solving the pricing subproblems. Let be the set of all feasible assignments(columns) to all actions(locations), where K aj is the number of feasible solutions of action a j . For any Y k aj , 1 ≤ k ≤ K aj , we assume Y k aj = {y k (1, t, s 1 , a j ), ..., y k (N, t, s N , a j )} to be a group of feasible solutions satisfying the resource constraint of a j . Let λ k aj ∈ {0, 1} be a binary variable indicating whether an assignment Y k aj is selected for action a j . The assignment problem now is formulated as the Restricted master problem (RMP) in Eq. 17. where the first set of constraints enforce that each agent takes at most one action and the second set of constraints enforce that at most one assignment is selected for each action. RMP can not be solved directly due to the exponential number of columns, so we start from a small subset of the columns. Whether other columns for the RMP join the subset is decided by solving the pricing problem of each action a j , which can be reformulated as a knapsack problem. Given µ i , ν j being the optimal dual price of the first and second constraints in the RMP, respectively, the pricing problem of a specific action a j , ∀a j ∈ A can be defined as: Consequently, if the objective values of any pricing problems are less or equal to zero, the current optimal solution for RMP is also optimal for the unrestricted master problem (Savelsbergh 1997) . Algorithm 2 summarizes the steps of our approach to deal with large-scale fleet allocation problem. To provide a more complete understanding of the effectiveness of our model, we utilize a large-scale historical taxi data in Manhattan island to evaluate our proposed method. The data used in this work were collected from taxi data-sets in Manhattan island, New York from November 1st-20st, for a j ∈ A do 9: Constructing the Pricing problem in 18. Obj, Y new ← Solving the Pricing problem. if Obj > 0 then To obtain the performance of our planning framework, a Monte Carlo simulation is implemented in this work. In the simulation phase, 3000 autonomous taxis are randomly generated on 00:00 am per day, and each day is treated as an episode segmented into 10-minute windows (T = 144). We use the first 15 episodes for learning and conduct evaluation on the following 5 episodes. All the quantitative results of fleet allocation presented in this section are averaged over ten runs. We adopt three evaluation metrics below: • Income Per Hour (IPH). The average IPH represents the efficiency of taxis allocation as a whole. • Occupied rate(OR). The occupied rate of a taxi is defined as the ratio of the time spent on carrying a trip to the total operating time of the taxi. • Gross Merchandise Volume (GMV). GMV records the total payment of all trips served by taxis in the system. Additionally, in this phase we set some hyper-parameters: the action searching threshold δ = 10 min, the matching parameter θ = 0.94 by logistic regression on real-world data, and the lower-bound probabilityp m is presented as timedependent forms in Table 1 empirically. To evaluate the performance of the optimal policy from our real-time allocation model, there are two baseline heuristics: Figure 4 : Distribution of the Income Per Hour 1. Hotspot walk, essentially defines a stochastic policy π i for each agent i, where the actions are chosen according to the probability P r i (t, s i , π i (t, s i ) = a) computed by trips quantity at each step t: 2. DiDi repositioning (Xu et al. 2020) , employs a centralized optimization framework to determine the repositioning tasks(next pick-up location) of each taxi. Each task will be assigned by solving the Mixed-integer linear programming (MILP) model similar to Eq. 12, while the reward function U (i, t, s i , a) will be set to the partial derivative with respect to n taxi in Eq. 2: Table 2 compares average IPH, OR and GMV of Hotspot Walk, DiDi Repositioning and Optimal policy from our approach. Averagely our policy provides the best solution for taxis with 12.40% higher average IPH, 6.54% larger average OR and 4.59% better GMV compared to Hotspot Walk. DiDi repositioning ranks next, leading 5.27% average IPH, 3.07% average OR, 3.56% GMV than Hotspot Walk. In summary, the proposed method showed a better performance In detail, Figure 4 and 5 show the concrete IPH and OR probability distributions of the three models, respectively. Consistent with aforementioned analysis, the optimal policy of our approach achieves the best average IPH and OR, reflecting that our approach provides a better utilization rate of taxi resource and presents a great potential in practical use. Having obtained encouraging results of optimal policies in simulation, we are also curious about adjusting the fleet rather than maintaining a fixed size during the whole day. Because making a dynamic adjustment of fleet could reduce unnecessary cost and improve some social welfare like mitigating traffic congestion and conserving fuel. Figure 6 depicts the heatmaps of taxis' matching probability between 05:00am and 06:00pm, we can observe that the matching probability at 05:00am is significantly weaker than 06:00pm. The main factor resulting in this conspicuous difference is the oversized fleet serving such scarce demands in 05:00am. Therefore, it's essential for the platform to dynamically adjust the fleet size to avoid waste. According to the simulation results, we further compare the available fleet and actually required fleet of each step t in Figure 7 . The green curve shows the available fleet size and the red indicates the required size, providing a guideline to control the fleet size. In particular, the platform need to downscale their fleet size in early morning 00:00am-08:00am (t ∈ [0, 48]) because of off-peak situations. As a sharp contrast, the required fleet outstrips the available taxis in the evening-peak 04:40pm-08:00pm (t ∈ [100, 120]), so the platform should operate more vehicles than initial. In this paper, we propose a novel framework of a real-time fleet allocation approach providing an effective redistribution for autonomous taxis on self-driving platforms. To do so, we model the allocation problem as a CMMDP and then divide it into multiple assignment problems by online implementation, in which the value of assignment pair is computed as the sum of an instant reward and a longterm expected value learned from historical data. Furthermore, the real-world large-scale assignment is solved more efficiently with the column generation algorithm. Through extensive experiments on the simulator, the proposed algorithm remarkably improves individual's efficiency and platforms profit, and also reveals a time-varying fleet adjustment policy to minimize the operation cost. In this work, we assume taxis transfer to next-step destinations immediately, whereas the routing paths are ignored. Investigating path decisions possesses superior appliance value (Yu et al. 2019; Tang et al. 2020) . Also, in the fleet adjust experiment, it appears that dynamic fleet size delivers lower cost and higher utilization, while dynamic fleet size adjustment models such like determining when, where and which taxis should join and exit the operation are missed in this work. Including such appealing models in the future would be more beneficial. Constrained Markov decision processes Digraphs: theory, algorithms and applications A Markovian decision process Dynamic programming and optimal control A polynomial algorithm for decentralized Markov decision processes with temporal constraints An iterative algorithm for solving constrained decentralized Markov decision processes Planning, learning and coordination in multiagent decision processes Spatial equilibrium, search frictions and dynamic efficiency in the taxi industry Coordinated Plan Management Using Multiagent MDPs. the Association for the Advancement of Resource-constrained multi-agent Markov decision processes Risk-Aware Conditional Replanning for Globally Constrained Multi-Agent Sequential Decision Making Routing with time windows by column generation A note on two problems in connexion with graphs Column Generation Algorithms for Constrained POMDPs A generalized assignment heuristic for vehicle routing Constrained-Action POMDPs for Multi-Agent Intelligent Knowledge Distribution A linear programming approach to the cutting-stock problem Autonomous taxis could greatly reduce greenhouse-gas emissions of US lightduty vehicles An effective taxi recommender system based on a spatio-temporal factor analysis model Autonomous vehicle implementation predictions Online spatio-temporal matching in stochastic and dynamic domains Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes A column generation approach for graph coloring Fleet management for vehicle sharing operations Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs Towards reducing taxicab cruising time using spatio-temporal profitability maps SCRAM: A Sharing Considered Route Assignment Mechanism for Fair Taxi Route Recommendations Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs A branch-and-price algorithm for the generalized assignment problem Optimal passenger-seeking policies on Ehailing platforms using Markov decision process and imitation learning The dynamic assignment problem A Mixed Path Size Logit-Based Taxi Customer-Search Model Considering Spatio-Temporal Factors in Route Choice Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach When Recommender Systems Meet Fleet Management: Practical Study in Online Driver Repositioning System A Dynamic Adjustment Model of Cruising Taxicab Fleet Size Combined the Operating and Flied Survey Data A Markov decision process approach to vacant taxi routing with e-hailing The author Yue Yang would like to appreciate his precious and impressive experience of being an intern in Didi Chuxing, and all the data and codes of this work were shared on the GitHub.