key: cord-0603346-g88s2gy2 authors: Liu, Sheng; Shen, Zuo-Jun Max; Ji, Xiang title: Urban Bike Lane Planning with Bike Trajectories: Models, Algorithms, and a Real-World Case Study date: 2020-08-21 journal: nan DOI: nan sha: cf9ec5db369119a0d7edf9346c21df9a8012d6ce doc_id: 603346 cord_uid: g88s2gy2 We study an urban bike lane planning problem based on the fine-grained bike trajectory data, which is made available by smart city infrastructure such as bike-sharing systems. The key decision is where to build bike lanes in the existing road network. As bike-sharing systems become widespread in the metropolitan areas over the world, bike lanes are being planned and constructed by many municipal governments to promote cycling and protect cyclists. Traditional bike lane planning approaches often rely on surveys and heuristics. We develop a general and novel optimization framework to guide the bike lane planning from bike trajectories. We formalize the bike lane planning problem in view of the cyclists' utility functions and derive an integer optimization model to maximize the utility. To capture cyclists' route choices, we develop a bilevel program based on the Multinomial Logit model. We derive structural properties about the base model and prove that the Lagrangian dual of the bike lane planning model is polynomial-time solvable. Furthermore, we reformulate the route choice based planning model as a mixed integer linear program using a linear approximation scheme. We develop tractable formulations and efficient algorithms to solve the large-scale optimization problem. Via a real-world case study with a city government, we demonstrate the efficiency of the proposed algorithms and quantify the trade-off between the coverage of bike trips and continuity of bike lanes. We show how the network topology evolves according to the utility functions and highlight the importance of understanding cyclists' route choices. The proposed framework drives the data-driven urban planning scheme in smart city operations management. Urbanization is a global trend. More than half of the world's population now lives in towns and cities. It is projected that about 56% of the developing world and 82% of the developed world will be urbanized by 2030, which equals to 5 billion people (The Economist 2012). Much of this urbanization will unfold in Africa and Asia, bringing huge social, economic and environmental transformations (United Nations Population Fund 2018) . The main challenges brought by the fast urbanization include traffic congestion and air pollution. The growing number of cars exceeds the city's traffic carrying capacity and thus causes severe traffic congestion around the city. In the meanwhile, the vast amount of emissions generated by moving cars worsens the air quality and poses serious public health problems. For Beijing, cars accounted for 30% of city's self-generated pollutants contributing to air pollution (South China Morning Post 2018) . In addition to driving cars, cycling is a popular urban transit mode for daily commute. For instance, 30% of people go to work by riding bikes in Cambridge, UK (Department for Transport, UK 2015). According to U.S. Census Bureau (2012), 864,883 people cycled to work in the United States. Cycling is promoted by many countries as it benefits city residents in many different aspects. First, cycling is free of pollution and the large-scale adoption of cycling can benefit the urban environment. Second, the popularity of cycling could alleviate the traffic congestion and improves the overall traffic condition. Third, cycling is also a more affordable and healthy transit mode than driving cars. As shown by the City of Copenhagen (2010), mortality is reduced by 30% in adults who cycle to and from their workplace on a daily basis, which can translate to huge savings of health care cost. During the COVID-19 pandemic, bike ridership has increased by 21% in US urban areas as people are shifted from public transit to this open-air mobility mode (The Los Angeles Times 2020). To promote cycling and reduce bike crashes, bike lanes have been planned and constructed by city planners. In Amsterdam, for example, there are 250 miles of dedicated bike lanes in use. In the U.S., city governments are rolling out better bike lanes in a large scale. For instance, Chicago has planned to build 50 miles of bike lanes by 2019 (Chicago Tribune 2015) . Cities from developing countries, such as China and India, are also investing heavily in protected bike lanes (The Seattle Times 2016). To accommodate the booming cycling demand during the 2020 pandemic, municipal governments across Canada such as Toronto Council have fast-tracked their bike lane expansion plans (CBC 2020) . The construction of bike lanes improves the safety for cyclists, car drivers as well as pedestrians. It is reported that fatality rate is less than a tenth as high in the countries with well designed cycling road networks as in the countries without cycling friendly infrastructures (Pucker 2001) . Traditionally, bike lanes are planned based on experience and surveys. As smart phones are widely used, the GPS human mobility data becomes more available and makes it possible to utilize this fine-grained data in bike lane planning. Most recently, smart city infrastructure such as stationbased and dock-less bike sharing systems are expanding quickly across the world. Some examples include Citi Bike (New York, USA), Santander Cycles (London, UK) for station-based systems and Lime Bike (USA), Mobike (China) for dock-less systems. Bay Wheels, the bike sharing system operated in San Francisco Bay Area, has maintained a mixed fleet of station-based and dock-less bikes (mainly electric models). In these systems, people can use smart phones to pick up and drop off bikes at docking stations or arbitrary locations with a built-in smart lock. Bikes in the dock-less system are often embedded with GPS devices (thus often called "smart bikes", e.g., see The New York Times 2017) so the spatial bike trajectories can be tracked. These bike trajectories record the detailed travel pattern of bike riders, which can then be used to understand the route choices made by the cyclists. Compared to the survey statistics and conventional origin-destination data, the trajectory data generated from the smart city infrastructure reveals cyclists' footprints on the road network, which are otherwise difficult to collect and unknown to policy makers. The trajectory data is essential to bike lane planning as most bike lanes are constructed along the existing road network. The bike trajectory data is becoming increasingly accessible to city planning agencies and researchers as bike sharing system operators are pursuing open data policies. For instance, Mobike is fully collaborating with municipal governments to facilitate urban management (Xinhuanet 2017, Mobike 2018) . The Los Angeles Department of Transportation has created an open-source software platform called Mobility Data Specification (MDS) that allows authorities to collect data directly from mobility service providers (including dock-less bikes and scooters) in real time 1 . MDS is now used by more than 50 American cities and dozens more around the globe (The New York Times 2020). How to analyze and utilize this high-volume trajectory information to accomplish the smart city vision will be a critical challenge to city planners. This paper presents and solves the bike lane planning problem using the detailed bike trajectory data. The bike lane planning problem decides on which road segments of the existing road network to construct bike lanes, aiming to balance two main objectives: 1) coverage: cover as many bike trips as possible and 2) continuity: build more continuous bike lanes to minimize interruptions. We summarize our main results and contributions as follows. 1. We present the bike lane planning model in view of the cyclist's utility functions based on the trajectory data. We start from a simple adjacency-continuity utility function and then discuss the general class of utility functions. The choice of the utility functions is flexible to characterize the trade-off between the coverage and continuity objectives. To the best of our knowledge, our work is the first to formalize the general bike lane planning model built upon the bike trajectory data. 2. For the simple adjacency-continuity utility function, we show that the resulting bike lane planning model has a supermodular objective function and admits an efficient mixed integer linear program (MILP) formulation. For the general utility functions, we show that under reasonable conditions, the objective function is also supermodular and the resulting problem yields a polynomialtime solvable Lagrangian dual. Furthermore, we provide a linear programming approach to the Lagrangian relaxation subproblem and propose an efficient algorithm for the general utility functions. 3. To capture the interaction between cyclists' route choices and bike lane design, we propose a route choice based bike lane planning model and formulate it as a bilevel program. By exploiting the structure of the lower-level problem, we reformulate the bilevel program as a single-level MILP, which is asymptotically exact and computationally tractable. 4. We present a real-word case study based on the collaboration with an urban planning institution and a dock-less bike sharing company. We collect and preprocess a large bike trajectory data set, and test our models and algorithms via extensive numerical experiments. The numerical experiments validate the efficiency of the proposed algorithms and deliver insightful comparison results between different models. We show how the topology of bike lane network would change according to the utility functions and provide quantitative measures to analyze the trade-off between coverage and continuity. We also highlight the importance of understanding cyclists' route choice behaviors in the bike lane planning. Liu, Shen, and Ji: Bike Lane Planning 5 The remainder of the paper is organized as follows. Section 2 reviews the related literature. Section 3 develops the bike lane planning model and presents tractable formulations and structural results, which motivates efficient algorithms. Section 4 is devoted to the real-world case study and numerical experiments that deliver managerial insights for policy makers. Section 5 concludes the paper and presents several future research directions. Smart city operations have received growing attention from the operations management community. Different aspects of the smart city movement have been studied, including smart grid, smart transportation, and smart retail. The "smart" parts of these aspects lie in the innovative technologies that can disrupt the present urban environment and city operations, or new data collection and data analytics tools to inform better understanding and decisions for city management. For instance, Zhang et al. (2020) study a promising scheme where vehicle-to-grid (V2G) electricity selling is integrated in electric vehicle sharing systems, as enabled by new technological development. They provide important strategic planning and operational tools to support the advancement of this scheme. Shared mobility, where shared passenger cars are deployed to provide ride and logistics services, has been examined from various perspectives such as pricing (Taylor 2018 , Bai et al. 2018 , Guda and Subramanian 2019 , admission control (Afeche et al. 2018) , repositioning (He et al. 2020) and last-mile delivery (Qi et al. 2018) , among others. In retailing, recent development of IT and big data tools have enabled innovative omni-channel strategies (Gao and Su 2016, Harsha et al. 2019 ) and data-driven pricing and logistics policies (Perakis et al. 2018 , Glaeser et al. 2019 , Liu et al. 2019 . We refer readers to Mak (2018) and Qi and Shen (2019) for thorough reviews of papers in smart city operations. Our paper is among the first to develop rigorous analytical models to tackle an urgent city planning problem built upon new mobility data sources, and thus promotes the smart city vision. There has been an increasing focus on the planning and control of bike facilities, in particular, the optimization of bike-sharing systems in recent years. The majority of the existing literature tackles the operational problems of bike sharing systems, such as balancing bike stock levels among bike stations to satisfy temporal and spatial demand. For example, Shu et al. (2013) address the bike deployment and redistribution problems with network flow formulations. O'Mahony and Shmoys (2015) study similar rebalancing problems as well as routing and clustering problems with different integer programming techniques. In addition, empirical approaches have been applied to shed lights on the optimal configuration of bike share systems (Kabra et al. 2019) . However, few papers consider the roads used by cyclists to travel between bike stations, which is another fundamental deciding factor of a bike-sharing system's success. Noticeably, the above three papers only utilize the origin-destination information of bike rides while our paper incorporates a fine-grained bike trajectory data set, which makes it possible to capture more detailed travel patterns of bike riders. Traditionally, bike lane planning is not built upon reliable real-world data. A prominent approach of traditional bike-lane planning is to select road segments (on which to construct bike lanes) by evaluating them with respect to a set of predetermined guidelines and criteria. Dondi et al. (2011) propose a context sensitive approach and make the selection based on the analysis of the visual effects, environmental contexts, and traffic considerations of road segments. Rybarczyk and Wu (2010) select road segments with a modified simple additive weighting method to calculate an overall score for each road segment based on its rankings among all road segments for each criterion. Since the guidelines and criteria in these papers are decided based on expertise or specific needs, there is a potential risk of subjective bias. Another classical approach is to select road segments based on cyclists survey results that reveal their preferences. One such approach is built on stated preference surveys, in which respondents are asked to imagine their preferences in some hypothetical cases (Tilahun et al. 2007 , Stinson and Bhat 2003 , Hunt and Abraham 2007 . However, results from the stated preference surveys could be inaccurate as they do not represent cyclists' actual choices. Researchers also adopted revealed preference surveys, in which respondents reveal their actual choices (Sener et al. 2009 , Howard and Burns 2001 , Hyodo et al. 2000 . Results from revealed preference surveys are also subject to sampling biases and small sample biases. In addition to subjective and survey based methods, there are a few papers that develop analytical methods for the bike lane planning problem. Lin and Yang (2011) consider a strategic designing problem for bike sharing, in which the location of bike stations and bike lanes are decided to minimize the system-wide operating cost. Their model only accounts for the origins and destinations while ignoring the road network. As a result, their model can not be directly used to guide the practical bike lane construction. Most recently, Bao et al. (2017) propose several heuristics to decide the bike lane locations by maximizing a specific score function under budget and connectivity constraints. Our paper, on the other hand, proposes a general and tractable modeling framework for the bike lane planning, and derives structural results about the resulting planning problem. The structural results motivate efficient algorithms with empirically validated performance. From a modeling perspective, our paper is related to the general class of facility location problems. While many traditional facility location models are focused on locating warehouses/distribution centers/retailing stores to maximize profits or minimize operational costs (Snyder and Shen 2019), we consider locating bike lanes on the existing road network to maximize riders' utility. Hence our paper echoes the emerging location models in nonprofitable operations and healthcare operations (Cho et al. 2014 , Chan et al. 2017 ). We first introduce the bike lane planning model based on bike trajectory data with a specialized utility function in Subsection 3.1. We then analyze the structural properties of the model and discuss a class of general utility functions in Subsection 3.2. Lastly, we extend the model to consider cyclists' route choices in Subsection 3.3. Let V be the set of all road segments that have been visited by bike riders in our data set and M be the set of cyclists. We say two road segments i and j are neighbors, (i, j) ∈ N , if they are connected. We can also interpret (i, j) ∈ N as road intersections. For each road segment i ∈ V , we use d i to denote the associated number of bike trips. Similarly, each pair of connected road segment (i, j) ∈ N is also associated with the number of trips that go through the corresponding intersection, d ij . We consider two main objectives of designing bike lanes inspired by the literature and our communication with the biking community: 1. Constructed bike lanes should be able to cover as many bike trips as possible. Continuity is preferred for both cyclists and urban planning institutions. Krizek and Roland (2005) show that the discontinuity of bike lanes can cause great discomfort to cyclists. The discontinuity at intersections also generates potentially higher crash risks. For the government, discontinuous bike lanes can pose management challenges as well as construction difficulties. The coverage objective and the continuity objective are often conflicting with each other, as shown in Bao et al. (2017) . Maximizing only the coverage may lead to very dispersed bike lanes while maximizing only the continuity can leave many cyclists uncovered. So we need to find the ideal trade-off between the two objectives. Now we formalize the two objectives from the cyclist's perspective. For a cyclist m, we use Summing over the utility functions of all cyclists gives where d i is the number of trajectories going through road segment i and d ij is the number of trajectories going through the intersection (i, j). So i∈V d i x i stands for the total number of covered road segments (with bike lanes) weighted by the travel demand, and (i,j)∈N d ij x i x j is the additional continuity utility for two adjacent bike lanes weighted by the travel demand. Since the above utility function measures the continuity utility along two adjacent road segments, we call this utility function as the adjacency-continuity (AC) utility function. Note that in our discussion we treat each road segment equally regardless of the length for the ease of exposition and the analysis extends straightforwardly to consider the impact of length on the utility. The objective function measures the cyclists' welfare from bike lanes. The parameter λ ≥ 0 determines the relative continuity benefit to the cyclist. A higher λ means continuity is more desirable and thus a more continuous bike lane network would be proposed. Constraint (1) is the budget constraint that ensures the total construction cost does not exceed the allowable government budget B. In practice, there may be other constraints that limit the construction of bike lanes in certain regions, which can be incorporated as needed. Furthermore, it is possible that building bike lanes along certain roads may reduce the capacity for car traffic flows, and hence worsening the traffic condition. To account for this, we can add additional penalty terms to our objective function. The detailed modeling of this traffic impact is out of the scope of this paper and we leave it for future research. The above model is set up for the scenario when a city is planning bike lanes from scratch. For cities that already have bike lanes in use (such as New York City), the proposed model can help guide the network expansion: we can fix the decision variables x i = 1 and set c i = 0 for road segments that already have bike lanes in use, and B is equal to the expansion budget. The obtained solution will then identify road segments to build new bike lanes. 3.1.1. Analysis When λ = 0, the problem reduces to the classical Knapsack problem, which is NP-hard. Otherwise, the problem is a special case of the 0-1 quadratic knapsack problem (QKP), where the coefficient matrix for the quadratic terms have a sparse structure. If d ij is separable and can be decomposed as d ij =d idj , then the objective function is referred to as the half-product function. The maximization of the half-product function over a knapsack polytope is known to admit a Fully Polynomial-Time Approximation Scheme (FPTAS). However, no FPTAS is available for the general non-separable objective function. Nevertheless, similar to the general QKP, it can be shown that the objective function with d ij ≥ 0 is supermodular. The proofs of Lemma 1 and other results in this section are presented in the Appendix. The supermodularity result implies that the problem can be solved efficiently without the budget constraint. So we can adopt the Lagrangian relaxation methodology to relax the budget constraint. The resulting Lagrangian dual is given as where Φ(u) = max Here Φ(u) is the Lagrangian relaxation of BL-AC for dual variable u ≥ 0. Based on the result of Gallo and Simeone (1989) and Chaillou et al. (1989) , Φ(u) can be solved in polynomial time. Proposition 1. When λ ≥ 0 and d ij ≥ 0 for all (i, j) ∈ N , the Lagrangian dual of BL-AC can be solved in polynomial time. More specifically, one can show that the Lagrangian dual is a piece-wise linear convex function with at most |V | break points. And each Φ(u) is equivalent to a maximum flow problem (Chaillou et al. 1989 ). There are other Lagrangian relaxation methods for QKP that rely on relaxing different sets of constraints (e.g. Caprara et al. 1999) , we refer readers to Pisinger (2007) for an extensive review. Since the Lagrangian dual only provides an upper bound, we may still need to perform branch and bound to get an exact solution. Alternatively, we can get an equivalent mixed integer linear programming (MILP) formulation to BL-AC, which can deliver satisfactory computational performance with the commercial MILP solvers. Although the objective function of BL-AC is nonlinear, we can linearize the product terms in BL-AC by replacing x i x j with y ij , and derive the following MILP formulation for bike lane planning: Constraints (5) The AC utility function assumes the continuity utility only applies to two adjacent road segments. Hence, in this section, we discuss a more general class of cyclist utility functions with the consideration of continuity beyond adjacency. Given a trajectory r and a bike lane construction plan x, let S x (r) denote the set of subsets of maximal continuous road segments with bike lanes on r. where f (·) is an increasing function. Under the adjacency-continuity utility function, f (|s|) = |s| + λ(|s| − 1) = (λ + 1)|s| − λ, which is a linear function of |s|. And the score function used by Bao et al. (2017) is a special case of (11), wherein f (|s|) = |s|α |s| with α ≥ 1. We will refer the bike lane planning model with the general utility function as BL-GU. Although maximizing the general utility function is often challenging due to the nonlinearity, we can show that the utility function (11) has a desirable structure when f (·) is further assumed to be convex. The convex assumption of f (·) is consistent with the notion that cyclists receive additionally more benefits by riding through more continuous bike lanes. For example, utility functions such as the ones with f (|s|) = |s|α |s| (α > 1) are supermodular. With supermodular utility functions, the general utility maximization problem over the budget constraint has a polynomial-time solvable Lagrangian dual problem. is an increasing convex function, maximizing the utility function v x (r) defined in (11) over a budget constraint yields a polynomial-time solvable Lagrangian dual. However, different from the adjacency-continuity utility function, each iteration of the Lagrangian relaxation under the general utility function is not equivalent to a maximum flow problem. Instead, we can use a general supermodular maximization oracle such as Fujishige's minimum-norm-point algorithm (Fujishige 2005) . Then the computational performance of solving the Lagrangian dual problem heavily depends on the efficiency of the supermodular maximization oracle. In our case, the minimum-norm-point algorithm is not computationally efficient. Nevertheless, the bike lane planning problem using the general utility function (11) can be formulated as an MILP. Proposition 2. Under the general utility function (11), the bike lane planning problem can be solved as an MILP. Note that the MILP formulation is attainable without assuming f (·) is convex. Specifically, the general utility function can be represented as and v x (r) = β 1 x 1 + β 2 x 2 + β 3 x 3 + β 1,2 x 1 x 2 + β 2,3 x 2 x 3 + β 1,2,3 x 1 x 2 x 3 , where coefficients β's can be calculated as (1), More generally, β l = f (|l|) − 2f (|l| − 1) + f (|l| − 2) for a nonempty l (the proper definition requires f (−1) = 0). When f (·) is an increasing convex function, all β's are nonnegative. The above analysis can be extended to consider heterogeneous road lengths. When the utility depends on the riding distance, the | · | function in (11) can be interpreted as the total length function for a given subset of continuous road segments. Now denote the trajectory r as {i 1 , . . . , i n }, the coefficients β l in Equation (12) can be calculated as follows: . . , i j+k−1 }|). Hence, in the example where r = {1, 2, 3}, we would have We provide an inductive proof about the validity of the above calculation in Appendix B. Since function (12) only involves product terms with binary variables, we can linearize them to get an MILP in a similar manner to BL-AC-MILP. We call this formulation BL-GU-MILP. i∈V Again, constraints (13) are not necessary if β's are nonnegative. The number of constraints (14) may be intimidating, but we can get a simple reduction by utilizing the nested structure of l. For l ∈ L(r) with |l| > 2, two subsets, l − and l − , can be obtained by removing the first and the last road segment of l, respectively. Since l − , l − ∈ L(r), we can use two constraints y l ≤ y l − and The outer approximation algorithm maintains a lower limit u and an upper limit u for the optimal dual solution such that g(u ) < 0 and g(u ) ≥ 0. Then the function Ψ u ,u (u) = max{Φ(u ) + (u − u )g(u ), Φ(u ) + (u − u )g(u )} is an outer approximation of Φ(u) (note that Φ(·) is a convex piecewise linear function). Observe that Ψ u ,u (u) has a minimizer of u * = [(Φ(u ) − u g(u )) − (Φ(u ) − u g(u ))]/(g(u ) − g(u )) = [S(x(u )) − S(x(u ))]/(c(x(u )) − c(x(u ))), and we use it to update the lower/upper limit. Because Φ(u) has at most |V | break points, this outer approximation c(x(u ))−c(x(u )) , and solve for x(u * ) with LP. Set u = u * if c(x(u * )) > B and u = u * otherwise; Update u * = S(x(u ))−S(x(u )) c(x(u ))−c(x(u )) , and solve for x(u * ) with LP. 11: end if algorithm terminates with at most |V | + 1 iterations (Gallo and Simeone 1989) . At termination, we obtain u * and x(u * ), which are the optimal dual solution and its corresponding primal solution, respectively. If the solution x(u * ) satisfies the budget constraint at equality, then it is also optimal to BL-GU-MILP; otherwise we select road segments from the subset V (u * ) = {i ∈ V : x i (u * ) = 1} by solving a new MILP, as presented in step 10 of the algorithm. Because V (u * ) is smaller than V (especially when B is not so large), the new MILP involves much less variables and can be solved more efficiently. We test the efficiency of this algorithm in the numerical study. Up till now our model makes no assumptions about cyclists' responsive behaviors to the bike lane construction plan. The aforementioned models maximize the cyclists' utility assuming their route choices are fixed (their trajectories will not be impacted by the constructed bike lanes). This assumption may not be valid if cyclists update their route choices based on the constructed bike lanes. Since the utility from riding through a route is influenced by the constructed bike lanes, cyclists may choose to take a different route than the observed trajectory if more bike lanes are constructed along that route. To account for cyclists' route choices, we assume for a cyclist m riding from i ∈ V to j ∈ V , she can choose from a set of candidate routes/trajectories C m = {r 1 m , . . . , r tm m }, where each route starts with i and ends with j. In addition to the bike lane utility, cyclists' evaluation of a route also depends on the travel time, slope, noises, and other physical characteristics. Therefore, we add to the utility function an exogenous disutility termv(r) that mainly captures the travel time cost of route r. In practice,v(r) can be estimated beforehand and assumed to be known. Given a bike lane construction plan x, cyclist m chooses the route r ∈ C m with probability p mr according to some choice model. Then the objective of the bike lane planning problem is where D m is the aggregated demand of cyclist m (after aggregation based on origin-destinations). The bike route choice is often modeled using the Multinomial Logit model (MNL), as shown in Hood et al. (2011) and Khatri et al. (2016) . Under the MNL model, the probability p mr is given by . Because many alternative routes have overlapping road segments, the assumption of irrelevant alternatives may not be satisfied. To relieve this concern, a correction term called Path Size factor (PS) can be added, e.g., see Broach et al. (2012) . This correction term can be calculated beforehand and is independent of the bike lane decision, thus the structure of p mr remains the same. The consequent bike planning problem is similar to an assortment optimization problem as decisions in both problems shape the choice probabilities. However, while the assortment decision influences the choice probabilities by altering the choice set, the bike lane construction decision transforms the choice probabilities by changing the utility values. That being said, the choice set in our problem is invariant to the decision variables, which is a critical difference between our problem and the assortment optimization problem. Furthermore, unlike the assortment optimization where the marginal profit of each product is exogenously given, the "profit" from bike lanes v x (r) depends on the decision variables. Hence problem (18) also shares a similar structure to the joint assortmentpricing optimization problem. However, the decisions here are binary and the analysis in the pricing literature can not carry on. To see this, first note that the optimal solution to (LL) must be positive, then let γ denote the dual variable of constraint (20). The optimality condition is Combining with Equation (20), we obtain that . Therefore, the original bike lane planning model can be reformulated as a bilevel program with the lower-level problem (LL). Because (LL) is convex, we can borrow the ideas from convex bilevel programming to solve the resulting bike lane planning model. (2019), we devise a linear approximation approach that approximates the convex parts in the objective function of (LL) by piecewise linear functions. Specifically, we sample from [0, 1] K points {p 1 , p 2 , . . . , p K } such that p i < p j for all i < j. Then the piecewise linear approximation of p mr ln p mr takes the form of p(ln p k + 1) − p k , ∀k = 1, . . . , K. Therefore, we obtain the linear approximation of (LL) as min pmr:m∈M,r∈Cm m∈M r∈Cm ω mr ≥ p mr (ln p k + 1) − p k , ∀m ∈ M, r ∈ C m , k = 1, . . . , K, Next, we can replace (LL) in the bike lane planning problem by a set of optimality condition constraints associated with (LL-Lin), i.e., primal/dual feasibility and complementary slackness constraints. In order to derive an MILP, the final step is to linearize product terms p mr v x (r) that appear in both (LL-Lin) and the objective function (18). Note that v x (r) is a sum of binary variables, and we can write p mr v x (r) as where auxiliary variables ζ mrl are introduced along with the following two linear constraints ζ mrl ≤ p mr , ∀l ∈ L(r), ζ mrl ≤ y l , ∀l ∈ L(r). MILP. We provide the detailed formulation and more technical details in Appendix C The performance of our linear approximation approach depends on the linear approximation accuracy. We prove that this approach is asymptotically exact when K increases, as indicated in the following result. We apply the proposed models and algorithms to a real-world trajectory data set. First, we describe the data set and present its summary statistics. Then we discuss the computational performance and solution quality of the proposed algorithms. Lastly, we compare the bike lane construction plans generated by different models with varying parameters, with and without consideration of cyclists' route choice models. We obtain a GPS bike trajectory data set via our collaboration with the urban planning institution Therefore, we decided to only keep the data from March-July 2017. Note that during the collection period, there were two other dock-less bike-sharing platforms operating in Zhuhai, and the demand was split between these platforms. Since the competing platforms share a similar business models (and their bikes have similar designs), we assume our trajectory data can approximately represent the travel pattern and route preferences of bike riders. Nevertheless, the data set is potentially biased due to bike availability and infrastructure issues. The observed demand is censored as potential ride demand may be lost due to the lack of available bikes or safe bike infrastructure in the neighborhood. To alleviate the the availability bias, we apply a decensoring approach to reconstruct the true demand, which will be detailed in the sequel. To uncover the lost demand due to missing infrastructure, we can potentially combine the observed demand data with user click data (on mobile apps) and survey data to understand the riders' behaviors and preferences (on safe infrastructure). We will leave this as a future research direction. The obtained trajectories are mapped to the road network extracted from OpenStreetMap (Geofabrik 2018) 2 . Each trajectory contains a timestamp that indicates the start time of a trip and a series of GPS coordinates that were recorded every 5 seconds. After removing trajectories that are shorter than one minute, there are 109,640 bike trajectories in total. Data Decensoring Our original trajectory data is subject to the censoring bias from stock-out events, e.g., potential riders can not fulfill their demand if the neighborhood area has no bikes available. To address this issue, we follow the idea from O'Mahony and Shmoys (2015) to decensor the trip demand and the corresponding trajectory data. Specifically, we first identify the stock-out events from the observed stock evolution in each neighborhood, and then calculate the average demand conditional on that there are no stock-outs for each neighborhood. Afterwards we reweight the trajectory data to be aligned with the conditional average demand. We provide additional details about this decensoring procedure in Appendix E. Figure 1 presents the distribution of bike trajectory duration from our data set. The average trajectory duration is 903.6 seconds (15.1 minutes), and the majority of trajectories have duration shorter than 20 minutes. This is because most trajectories are limited to the urban and residential areas of Zhuhai. We identify 3,735 road segments from the trajectories in total. The average length of the road segments is 195.3 m, and more than 60% of road segments have lengths under 200 meters. Most roads are constructed in the urban area of Zhuhai with high density of population, and thus intersections are close to each other in this area. Bike trajectory duration distribution. Figure 2 presents the spatial distribution of the road segment usage frequency (the number of trajectories passing through the road segment). We observe that the locations of the road segments with high usage levels are located in a few areas in Zhuhai, which include the financial district, shopping district, and residential districts of the city. We also find that many popular road segments are spread out over the city, which implies that merely maximizing the coverage of bike trips would result in a highly discontinuous bike lane system. More details about the data set can be found in Appendix D. We evaluate the computational performance of the proposed models and algorithms on the practical road network and trajectory data set. For BL-GU, the utility function takes the form of v x (r) = s∈Sx(r) |s|α |s| . We vary the choices of λ, α, B, and the number of sampled trajectories (cyclists) m. The experiments were conducted with Gurobi (Gurobi Optimization 2018) and ran on a Windows Regarding the bike lane planning model with route choices, we implement the linear approximation scheme and the reformulated MILP as the solution approach. We set the approximation sample size K = 20, and the termination criteria to be: 1) the optimality gap (MIP gap) is below 0.1%; or 2) the solution time exceeds 6 hours. The computational performance is summarized in Table 2 (we only present the result for BL-GU as it is more difficult to solve than BL-AC in general). We observe that for practical size problems, a high-quality solution (with a small optimality gap) to the linear approximation model can be found within a few hours. The optimality gap is less than 0.1% for most instances and is 0.2% in the worst case. Hence the proposed MILP reformulation model is computationally tractable. We compare the bike lane planning solutions generated from BL-AC and BL-GU in terms of quantitative topological measures as well as visualization results. The setup is the same as in Table 3 presents the comparison results based on these features for BL-AC and BL-GU with different parameter values. Note that BL-GU with α = 1 is equivalent to GL-AC with λ = 0. We have several observations. First, in BL-AC (and BL-GU), increasing the value of λ (and α) leads to fewer selected bike lanes. Because when the continuity is assigned a small weight, the planning model tends to build bike lanes on more road segments to cover more bike trajectories. And when the continuity is more preferred, it is beneficial to build bike lanes in a few areas to make sure the bike lanes are connected to each other. Second, as λ grows in BL-AC, the number of continuous bike lane pairs increases, which is consistent with the objective function of BL-AC that maximizes the adjacency continuity. In the meanwhile, the mean size of continuous bike lanes, the maximum size of continuous bike lanes, and the mean number of connections per bike lane also increase. By contrast, we observe that the number of continuous bike lane pairs first increases and then decreases while other continuity measures increase with α in BL-GU. This is because given a larger value of λ, BL-GU tends to value the size of continuous bike lanes more than the adjacency includes both the observed route as well as the routes returned by Google. We apply the MILP model developed in Subsection 3.3 and choose the linear approximation sample size to be 20. The utility function is assumed to take the form of v x (r) − ηL r for a route The impact of B on the coverage ratio and mean size of continuous bike lanes. cyclists' choices, see, e.g., Khatri et al. 2016 , Datner et al. 2019 . The parameter η measures the disutility per km of riding distance, which can be estimated following the MNL model. Here we set η = 20 such that v x (r) is comparable with ηL r 3 . Figure 6 presents the bike lane planning results for BL-GU with route choices (B = 30 and α = 1.02, 1.05). We observe that accounting for route choices has a significant impact on the resulting bike lane network. Specifically, after incorporating the route choice model, the optimal bike lane network tends to spread out to cover more routes other than the observed ones, which compromises continuity to some extent. This highlights the importance of understanding cyclists' routing choices when bike lanes are present (including both coverage and continuity considerations). Policy makers can employ empirical approaches to understand the behaviors and integrate them with the analytical models proposed in this paper. This paper develops a set of novel formulations and algorithms to the bike lane planning problem that is becoming increasingly important in smart city management. Unlike the previous work that mainly builds on surveys and heuristics, we present a modeling framework that directly utilizes the GPS bike trajectory data from emerging dock-less bike sharing systems. Our model formalizes can impact the interaction between car flows and bike flows, and eventually change the traffic equilibrium of the whole urban environment. Second, since cyclists' responsive behaviors are often hard to predict before any bike lane is deployed, the city planning agencies may dynamically construct bike lanes in the city, e.g., first build a few bike lanes to learn the behaviors and then add more bike lanes. Then the problem becomes a dynamic strategic planning model with behavior learning. Third, jointly designing the bike lanes with other bike facilities such as bike sharing stations and parking areas can be of interest to the municipal governments. arrest location uncertainty via row-and-column generation. Note that the approximation error only comes from the piecewise linear approximation of function p ln p in the lower-level problem (LL). Let p be the solution to (LL) and p be the solution to (LL-Lin) (given decision x). And we use h(·) and h (·) to denote the objective function of (LL) and (LL-Lin), respectively. It follows that ∇h(p) = (ln p mr + v x (r) −v(r)) mr . It can be easily verified that ∇h(p) is strongly monotone with modulus 1 for 0 < p ≤ 1 (the Jacobian matrix is positive definite diagonal, and the smallest possible eigenvalue is 1, see also Dan and Marcotte 2019) . From the definitions of p and p , we have ∇h(p), p − p ≥ 0, ∇h (p ), p − p ≥ 0 ⇒ ∇h (p ) − ∇h(p), p − p ≥ 0. By the properties of ∇h(·), it holds that ∇h(p) − ∇h(p ), p − p ≥ ||p − p || 2 (from the Jacobian matrix), hence ∇h (p ) − ∇h(p ), p − p ≥ ||p − p || 2 . Furthermore, the incurred difference in the objective function of the upper-level problem (18) where D = max m D m , V = max x,r |v x (r) −v(r)|, and C * = arg max m |C m |. The second inequality follows directly from the CauchySchwarz inequality, and the third inequality is derived by applying the CauchySchwarz inequality to (25). Next, we bound the gap between ∇h (p ) and ∇h(p ). By the construction of h (·), each component of ∇h (p ), ∇h (p ) mr , is a piecewise constant approximation of log(p mr ). Let the K sampling points {p 1 , p 2 , . . . , p K } satisfy: 1) the samples start from p min to 1; and 2) the sample points are chosen such that the constant approximations are vertically equidistant. Then we have |∇h (p ) mr − ∇h(p ) mr | ≤ log 1 p min K . Ride-hailing networks with strategic drivers: The impact of platform control capabilities on performance Coordinating supply and demand on an on-demand service platform with impatient customers. Manufacturing & Service Operations Management Planning bike lanes based on sharing-bikes' trajectories Where do cyclists ride? a route choice model developed with revealed preference gps data Exact solution of the quadratic knapsack problem Bike lanes installed on urgent basis across canada during covid-19 pandemic Best network flow bounds for the quadratic knapsack problem. Combinatorial Optimization Robust defibrillator deployment under cardiac Liu Simultaneous location of trauma centers and helicopters for emergency medical service planning Copenhagen city of cyclists Competitive facility location with selfish users and queues Setting inventory levels in a bike sharing network Transport minister encourages people to get on their bike for cycle to work day Bike lane design: the context sensitive approach Submodular functions and optimization On the supermodular knapsack problem Omnichannel retail operations with buy-online-and-pick-up-in-store Openstreetmap china Optimal retail location: Empirical methodology and application to practice: Finalist-2017 m&som practice-based research competition Your uber is arriving: Managing on-demand workers through surge pricing, forecast communication, and worker incentives Dynamic pricing of omnichannel inventories. Manufacturing & Service Operations Management Robust repositioning for vehicle sharing A gps-based bicycle route choice model for san francisco, california Cycling to work in phoenix: route choice, travel behavior, and commuter characteristics Influences on bicycle use Modeling of bicycle route and destination choice behavior for bicycle road network plan Bike-share systems: Accessibility and availability Modeling route choice of utilitarian bikeshare users with gps data What is at the end of the road? understanding discontinuities of on-street bicycle lanes in urban settings Strategic design of public bicycle sharing systems with service level constraints On-time last mile delivery: Order assignment with travel time predictors Enabling smarter cities with operations management. Available at SSRN 3307458 How cycling changes cities Data analysis and optimization for (citi) bike sharing Joint pricing and production: A fusion of machine learning and robust optimization The quadratic knapsack problema survey Cycling safety on bikeways vs. roads Shared mobility for last-mile delivery: design, operational prescriptions, and environmental impact A smart-city scope of operations management Bicycle facility planning using gis and multi-criteria decision analysis An analysis of bicycle route choice preferences in texas, us Models for effective deployment and redistribution of bicycles within public bicycle-sharing systems Fundamentals of supply chain theory From coal to cars: Beijing moves up a gear in the war against air pollution Commuter bicyclist route choice: Analysis using a stated preference survey On-demand service platforms. Manufacturing & Service Operations Management f (|s |) for some s ∈ S B (r) ii) when both i − 1 and i + 1 belong to B, v B∪{i} (r) − v B (r) = f (|s −1 | + |s +1 | + where s −1 , s +1 ∈ S B (r) and i − 1 ∈ s −1 , i + 1 ∈ s +1 . Because A ⊆ B, we can verify that |s | ≥ |s| and |s −1 | + |s +1 | > |s|. As f (·) is increasing convex • If both i − 1 and i + 1 belong to A, v A∪{i} (r) − v A (r) = f (|s −1 | + |s +1 | + 1) − f (|s −1 |) − f (|s +1 |) Similarly, both i − 1 and i + 1 must belong to B as well, hence v B∪{i} (r) − v B (r) = f (|s −1 | + |s +1 | + 1) − f (|s −1 |) − f (|s +1 |), where s −1 , s +1 ∈ S B (r) and i − 1 ∈ s −1 , i + 1 ∈ s +1 . Because A ⊆ B, |s −1 | ≥ |s −1 | and |s +1 | ≥ |s +1 |. It follows that |s −1 | + |s +1 | ≥ |s −1 | + |s +1 | and of Corollary 1 Based on Theorem 1, the utility function v x (r) is supermodular and maximization of this utility function over a budget constraint is a supermodular knapsack problem we can further apply standard linearization techniques to get an MILP formulation, as shown in BL-GU-MILP. Proof of Proposition 3 When f (·) is an increasing convex function, the coefficients β l in BL-GU-MILP are all positive: β l = f (|l|) − 2f (|l| − 1) + f (|l| − 2) = f (|l|) − f (|l| − 1or +1, i.e., P T ij ∈ {−1, 0, +1}; 2) Each column contains at most two non-zero entries In this appendix we provide the technical proofs of the results in the paper as well as additional modeling details (the MILP reformulation to BL-GU with route choices, and data decensoring steps). Proof of Lemma 1 Let two sets of road segments A and B satisfy A ⊆ B. Consider adding a road segment k / ∈ B to the two sets. In terms of the first part of the objective function, the change caused by adding k is the same for A and B because of linearity. For the second part, as λ ≥ 0 and d ij ≥ 0 for all (i, j) ∈ N , the value of the second part can only increase with the newly added e. Since B can only contain more road segments than A, there are potentially more neighbors of k that are included in B. As a result, the increase of objective value caused by adding e is greater for B than A.Proof of Proposition 1 Directly from Lemma 1, BL-AC is essentially a supermodular knapsack problem, which admits a polynomial-time solvable Lagrangian dual, as shown in Theorem 16 of Gallo and Simeone (1989) .Proof of Theorem 1 Consider two bike lane construction plans, i.e. sets of road segments A and B to build bike lanes satisfying A ⊆ B, and a road segment i / ∈ B. We prove the supermularity of v x (r) by conditioning on i: 1) If i / ∈ r, then building a bike lane on i does not impact the value of v x (r), so v A∪{i} (r) = v A (r) andv B∪{i} (r) = v B (r). 2) If i ∈ r, let i − 1 and i + 1 denote the road segment visited before and after i on the trajectory r, respectively (when i is at the head or the tail of the trajectory, either i − 1 or i + 1 is empty and our analysis can be easily extended). We further consider the following three scenarios:Similarly, for B, as A ⊆ B we have: i) when either i − 1 or iCombining inequalities (26) and (27) Thus the approximation error of the upper-level objective function (18) Appendix B: The validity about the calculation of the coefficients β in general utility functionsFor any trajectory r = {i 1 , . . . , i n } (n ∈ N + ), the chosen β's will need to satisfyWe will prove this by induction. It is easy to verify that the calculation presented in Subsection (3.2) satisfy the above equations for trajectories with cardinality less or equal to three. Now we assume equations (28) are satisfied withfor trajectories with cardinality less or equal to n − 1. Then for a trajectory r = {i 1 , . . . , i n }, we have..,i n+j−3 + · · · + n j=2 β i j + β i 1 ,...,i n−1 + β i 1 ,...,i n−2 + · · · + β i 1 =β r + f (|{i 2 , . . . , i n }|) + β i 1 ,...,i n−1 + β i 1 ,...,i n−2 + · · · + β i 1 ,where the last equality follows by our inductive assumption. Furthermore, note thatwhere the equalities hold by our choices of β's. Combining the equations (29) Hence equations (28) hold for trajectories with cardinality equals to n. The optimality equations for (LL-Lin) can be written aswhere γ m , mrk are the dual variables of constraints (22) and (23) ζ mrl ≤ p mr , ∀m ∈ M, r ∈ C m , l ∈ L(r),ζ mrl ≤ y l , ∀m ∈ M, r ∈ C m , l ∈ L(r).Because the objective is maximizing cyclists' utility, the above constraints will make ζ mrl = p mr y l . Therefore, we obtain an MILP formulation for the bike lane planning model with route choices.Appendix D: Temporal distribution of trajectory Data Figure 7a depicts the distribution of bike trajectories across different hours of a day. It can be observed that there are two demand peaks: one occurs in the morning (7:00 to 9:00 am) and the other occurs in the evening (5:00 to 8:00 pm). These two peaks correspond to the commute rush hours. We also note that the bike trip demand falls gradually after the evening peak and still remains substantial until midnight, which can be attributed to people who engage in leisure activities after work. Figure 7b shows the distribution of bike trajectories across different days of a week. We observe that the bike trip demand is almost stable throughout the week while there are two small peaks on Mondays and Saturdays.Appendix E: The detailed decensoring procedure for bike trajectory dataTo minimize the seasonality effect, we pick a two-week period with stable and high ride demand to perform decensoring operations. Because there are no docked stations and riders can find available bikes in their neighborhood with smart phones, we split the urban area into small neighborhoods using the geohash script, which is a widely used geocode system 4 . Each neighborhood is about 153 meters by 153 meters according to geohash of length 7. Given a day, we track the evolution of available bikes (e.g., the stock level) every 10 minutes in each neighborhood. Then we identify the stock-out events from the stock evolution. A trajectory originating from neighborhood i in period j would be assigned a weight equals to 1/(# of days with no stock-outs in period j from neighborhood i). After aggregation the weighted trajectory data essentially represents the average ride demand conditional on there are no stock-outs, which shares a similar rationale of O'Mahony and Shmoys (2015).