key: cord-138886-8zwjdlrt authors: Xu, Yanyan; Olmos, Luis E.; Abbar, Sofiane; Gonzalez, Marta C. title: Deconstructing laws of accessibility and facility distribution in cities date: 2020-07-17 journal: nan DOI: nan sha: doc_id: 138886 cord_uid: 8zwjdlrt The era of the automobile has seriously degraded the quality of urban life through costly travel and visible environmental effects. A new urban planning paradigm must be at the heart of our roadmap for the years to come. The one where, within minutes, inhabitants can access their basic living needs by bike or by foot. In this work, we present novel insights of the interplay between the distributions of facilities and population that maximize accessibility over the existing road networks. Results in six cities reveal that travel costs could be reduced in half through redistributing facilities. In the optimal scenario, the average travel distance can be modeled as a functional form of the number of facilities and the population density. As an application of this finding, it is possible to estimate the number of facilities needed for reaching a desired average travel distance given the population distribution in a city. At a time of the very visible effects of the climate impact on our urban lives, some cities have become unbreathable, and greenhouse gas emissions are produced by buildings heating and cooling networks, and all-round petrol transport. At a time when transport has become the first emitter of CO 2 , we need to imagine, propose, other ways of occupying urban space. This calls for a better understanding of the spatial distributions of facilities and population (1) (2) (3) (4) (5) (6) (7) . The information age and the online mapping revolution allow us to globally study the interactions of humans with their built and natural environment (8) (9) (10) (11) (12) (13) . Pioneering work in multi-city studies have uncovered scaling laws relating population to distribution of facilities and socio-economic activities at macroscopic scale (3, 6, (14) (15) (16) . It has been asserted, for example, that more populated cities are more efficient in their per capita consumption (3, 4) and their occupation diversity can be modeled as social networks embedded in space (10) . Yet, a systematic understanding of the interplay of the urban form, their facilities distribution and their accessibility at multiple scales remains an elusive task. At the country scale, when maximizing for the accessibility of population to a fixed number of facilities, Gastner and Newman demonstrated a simple 2/3 power law between the optimal density of facilities d and their population density ρ (17) . The power law was fitted by allocating 5, 000 facilities in the continental U.S. using population data within more than 8 million census blocks. In this case, each facility covers an area about the size of a county (∼1,000 km 2 ). In a follow up study, Um et al. proposed distinct optimization goals to differentiate public services, such as fire stations and public schools, from commercial facilities, such as banks and restaurants (18) . Public service facilities aim to minimize the overall distance between people and the facilities, follow d ∝ ρ 2/3 . However, in the case of profit driven facilities, which have the goal of maximizing the number of potential customers, the power law has an exponent close to 1, that is d ∝ ρ. The authors found alignment in the analytical optimization and empirical distributions in the U.S. and South Korea, confirming the 2/3 exponent for public services and the 1 exponent for profit driven facilities. The simple power law at city scale reveals the equilibrium of empirical allocation of resources across cities with different population. However, distributing facilities at fine scale within cities, where the coverage area per facility is of few blocks (∼10 km 2 ), results in more heterogeneous settlements of population with different socio-economic characteristics. Studies of accessibility within cities merit attention for science informed land use planning and the redistribution of public services after disasters and evacuations (19) (20) (21) (22) (23) . Forward-looking approaches for planning facilities in cities would also consider individuals preferences to facilities via mining mobility patterns. Zhou et al. introduced a location-based social network dataset to derive the demand for different types of cultural resources, and identified the urban regions with lack of venues (24) . While efforts have been devoted to address the optimal allocation problem in specific cities (24) (25) (26) (27) , systematic understanding of the optimal distribution of facilities is still lacking from the urban science perspective. To contribute in this direction, we propose a multi-city study that measures the accessibility of city blocks to different types of facilities through their road networks, and investigate the role of population distributions. While at large scale, travel cost can be substituted by the Euclidean distance from residents to the facilities, road networks and geographic constraints play important roles for human mobility within cities (28) (29) (30) (31) . It has been well established that road network properties impact the daily journeys of residents (32) (33) (34) , their urban form (35, 36) and their accessibility (29, 37) . As a complement to most studies devoted to travel costs of commuters, we analyze in this work the road network distance of individuals to the nearest amenity of various types, dividing the space in high resolution blocks of constant area of 1 km 2 . For each city and facility type, we optimally redistribute the existing facilities and compare the result with their empirical distribution. We observe that in the redistribution some blocks increase their accessibility and others decrease it. This implies that in order to make the best use of the existing facilities for a more equitable accessibility, some blocks would benefit whereas others would have facilities removed. At the city level, the gap between the empirical facility distribution and the optimal planning offers the opportunity to assess the planning quality of facilities in diverse cities. We also revisit the power law between facility and population densities, and observe that the two-thirds power law is not followed by the empirical cases, and it is observed in the optimal scenario only when the number of facilities is small compared to the total number of blocks in the city. We further investigate optimal distributions of facilities by modeling its average travel distance in different cities as a function of the number of facilities to assign. A model of this quantity is derived on both synthetic and real-world cities and fits different cities well with only two free parameters. Furthermore, this gives us a universal function between the average travel distance and the number of facilities, controlled by the urban form derived from the population distribution. As an application case, we estimate the number of facilities required to achieve a given accessibility via the proposed function in 12 real-world cities. We select three cities (Boston, Los Angeles, New York City) in the U.S. and three cities (Doha, Dubai, Riyadh) in the Gulf Cooperation Countries (GCC) to study the empirical distribution of facilities. For each city we collect the population in blocks with a spatial resolution of 30 arc-seconds (1 km 2 near the equator) from LandScan (38) , road networks with the Open-StreetMap (39) , and facilities from the Foursquare (40) service application. These novel, rich, and publicly available datasets have proven value in transportation planning (34, 41, 42) , land use studies (43, 44) , and human activity modeling (45) (46) (47) . The boundary of each city is drew along with the metroplex, encompassing both urban and rural regions. Fig. 1 depicts the road network, population density, and ten selected types of facilities (e.g., hospitals, schools) in New York City (NYC) and Doha. The statistical information of six cities are summarized in Table 1 . For clarity, all variables and notations introduced in this work are summarized in note S1. The distribution of all available facilities in Foursquare data for the six cities are presented in fig. S1 . Details of the data sets are described in the Materials and Methods and note S2. it can be observed that Doha and Dubai have more facilities that are located in the highly populated areas, whereas Boston has the majority of facilities located near the city center with fewer people reside. Discrepancy in the distributions between population and facilities can also be observed in Los Angeles (LA), NYC and Riyadh. In these three cities, the population density peaks near the city center, but the facilities are distributed more uniformly across the city. It is noteworthy that, for calculating facility density and total number of facilities, we first merge the same type of facilities (e.g., hospitals) located in the same block as one facility. Thus, the number of facilities thereafter refers to the number of blocks accommodating a given type of facility, denoted by N . We define N max as the total number of blocks in one city. Furthermore, as nearly unpopulated blocks do not weight in the calculations of accessibility, we define N occ as the number of occupied blocks given by the blocks with population over a threshold. We set the threshold as 500 in real-world cities, which is commonly used to distinguish between urban and rural regions. The ratio between the number of blocks occupied by facilities N and populated blocks N occ is denoted by D occ . Table 1 reports the D occ of the ten selected types of facilities in the six cities of study. As an example, D occ of hospital in Boston equals 0.11 indicating that about 11% of populated blocks are occupied by hospitals. To quantify the accessibility of the population to facilities, previous work used the Voronoi cell around each facility, as a proxy of the tendency of individuals to select the closest facility in Euclidean distance (17, 18) . However, within cities, the distance that people travel in the road networks is constrained by the infrastructure and the landscape. In this context, the routing distance is a better proxy of the accessibility from the place of residence to each amenity. Fig. S2C compares the distributions of routing distance of the actual and optimal locations of facilities vs. the Euclidean distances, respectively. Interestingly, our findings confirm that the optimal strategy based on Euclidean distance achieves similar costs to the actual distribution of facilities, which is much less effective than the strategy that optimizes for routing distance. Accessibility indicates the level of service of facilities to the residents. In network science, accessibility is defined as the ease of reaching points of interest within a given cost budget (48) (49) (50) . How to allocate the facilities to maximize the overall accessibility in cities is one of the most essential concerns of facility planning. From this point of view, we redistribute the facilities by minimizing the total routing distance of population to their nearest facilities. In the following, we refer to this redistribution as the optimal scenario. Likewise, the empirical distribution of facilities is referred to as the actual scenario. Specifically, among the N max blocks of one city, we denote as facility-tagged the N blocks that are occupied by a given type of facility in the actual scenario and redistribute the same number of facilities in the optimal scenario. The shortest distance between any pair of two blocks is calculated using the Dijkstras algorithm in the road network. The idea is to find a new set of N blocks and label them as facility-tagged such as it minimizes the total population-weighted travel distance from all N max blocks to the newly selected N blocks. This optimal allocation problem in networks is known as the p-median problem and here it is solved with an efficient algorithm proposed by Resende and Werneck (51) (Materials and Methods). The difference of the travel distance between the actual and the optimal scenarios assesses the quality of the distribution, and therefore, of the accessibility in different cities. In each scenario, each residential block is associated with the facility that can be reached in the shortest routing distance. The block is linked to itself if it is occupied by a facility. It is important to note that we do not consider in the present study the capacity of facilities as a constraint, i.e., the number of people using the same facility is not limited. We group the set of blocks served by the same facility, and define them as a service community. Considering hospitals as an example, we present in Figs. 2A and 2B the service communities in Boston in the actual and optimal scenarios, respectively. The color of each cluster depicts the total population p S j in the service In order to quantify the disparities between blocks in the level of service for a given type of facility, we compare the actual and optimal travel distances to facilities. We define a gain index of the ith block as: wherel i and l i are the shortest travel distances from the ith block to its nearest facility in the actual and optimal scenarios, respectively. A r i >1 identifies that the block is better served by the facility in the actual than the optimal scenario. Residents living in these blocks benefit more from the distribution of facilities than they would in the scenario of social optimum. In Table 1 ). In the GCC cities, fire stations are the most equitably distributed facilities, while bars, hospitals, parks and pharmacies are distributed less equitably than others. The Lorenz curves and the values of the Gini coefficients per facility type are presented in fig. S4B . The three cities in the U.S. are generally planned more equally than the GCC cities. Thereafter, we compare the difference in accessibility across cities to various facility types. (L) to the ten selected types of amenities. The first row displays the facilities with higher densities in the U.S. cities: banks, pharmacies, schools, parks and bars. Next comes hospitals and supermarkets, followed by concert halls, soccer fields and fire stations which have the lowest densities. As expected, the lower the density the longer the travel distance to them. Note that the accessibility to parks, fire stations and bars have the largest differences between U.S. and GCC cities, mainly due to lower availability in the later. To compare the travel distance in different cities in the same order, we exhibit the scatterplots ofL and L versus D occ , the ratio between N and N occ , in Figs. 3B and 3C, respectively. The discrepancy of actual travel distance Box plot of the optimality index, R, by facility type. Facility types are ranked by their average densities in the six cities in the descending order. Among the facility types, fire station is the most optimally distributed and bank and school are the worst. In general, facility type with lower density is better located than dense facility type from the perspective of collective benefit maximization. L among the six cities is mainly caused by the difference in facility planning strategy and urban form. As expected, the optimal travel distance L displays a more uniform tendency thanL, revealing the potential of modeling L with the number of facilities N . An interesting measure is the improvement of overall accessibility if the locations of facilities are optimally redistributed at city scale. To that end, we define the optimality index R for a given type of facility at city level as the ratio between the average travel distance to the nearest facilities in the optimal and actual scenarios, where p i is the population in the ith block. R ranges from 0 to 1, with 1 indicating the facilities are optimally distributed in reality. In note S3 and fig. S4C , we discuss the change of R with N/N max by introducing two extreme planning strategies, random and population-weighted assignments, described in note S3. We observe that R score of actual planning is mostly between the two extreme strategies, except Riyadh, in which R is even lower than random assignment. This suggests the imbalance between facility locations and service delivery in Riyadh (53) . Besides, we observe the R score of actual planning is the highest when N/N max is the smallest for cities, except LA and NYC. For the two extreme strategies, we observe R is u-shaped as a Previous work has related the facility density to population density as a power function both in the actual and optimal scenarios (18) at the national scale. Here, through introducing the Specifically, d S j =1/a S j , and ρ S j =p S j /a S j , where a S j is approximated by the product of the number of blocks n S j and the average block area in the city, that is a S j =n S jā . Taking hospitals as an example, their densities versus the population densities of the service communities in the actual scenario over the six cities are illustrated in Fig. 4A . The full lines represent the fitted power law functions with least squares method and with communities with more than 500 residents. Cities have different exponents and the r 2 scores of the fitting are less than 0.5 in most cases. These results show that, despite the 2/3 power law was found for public facilities at countyresolution (18), we do not find a uniform law between facility and population densities at finer resolutions i.e., intra-city community level. Once facilities are optimally redistributed in the city, the service communities are reorganized accordingly. The fitted power laws between the distribution of hospitals and population in optimal scenario of the six cities are shown in Fig. 4B . The fitted exponents are closer to 2/3 and have larger r 2 , and the 95% confidence intervals are narrower than those in Fig. 4A , depicting the actual scenario. The exponents for the ten selected types of facilities in the actual and optimal scenarios are reported in table S1. As expected, cities have different exponents for both actual and optimal scenarios. In all cases, we observe that the optimal exponents deviate from the analytical 2/3 previously reported when the facilities are optimally distributed by Euclidean distance at national case (17) . Sources of difference are both the constraints introduced by the road networks and the higher density of facilities to be distributed. For a comprehensive understanding of the existence of the power laws, we optimally allocate varying number of facilities N in our six cities of study and in synthetic cities. In Figs. 4C and 4D, we relate the β to D occ , the ratio of N to N occ , and observe 2/3 when D occ <0.2(0.1) for the real-world (synthetic) cities. We simulate controlled scenarios via four synthetic or toy cities of size 100×100, with population distributions depicted in Fig. 5A . Note that the population threshold is set as 50 in toy cities to count N occ , and the total population is fixed as half million, which is about 1/10 of the studied cities. We find the curves of diverse cities collapse into a single one, indicating the difference in the change of β across cities is mainly caused by different N occ . Interestingly, in the toy cities, we notice that the change of β is not monotonous. It stays around 2/3 when D occ is below 0.1. Subsequently, β decreases with D occ as more facilities are assigned to the low density regions and then increases as facilities start to refill the high density regions. After all high density blocks are assigned with facilities, β starts to drop to zero, implying all blocks are filled with facilities. The same fluctuation of β is not clearly observed in real-world cities because the large and low density regions are not segregated like in the synthetic cities. In summary, in the optimal scenario, the 2/3 power law can be found for a limited number of facilities, but tends to disappear for larger values of N . In Fig. 3B we see that D occ is the most determinant factor to decrease the average distanceL to a facility independent of its type and city. Interestingly, in Fig. 3C we observe that these de- For an estimate of the optimal travel distance L in each city, we first assume the in-block travel distance is constant l min =0.5 km, and the average travel distance within a service community approximates to g S j a S j,occ , where g S j denotes the geometric factor in the community; a S j,occ denotes the area of the occupied blocks (17) . Then L is expressed as the sum of two terms, the first for the population in the N blocks with facilities and the second for the population in the N max −N blocks without facilities: where P is the total population in the city,p S j denotes the population in the service community of the jth facility after removing the block where the jth facility is located, that isp S j =p S j −p j . We find that a S j,occ follows power law relation to the total area in community a S j in most cities, that is a S j,occ ∝ (a S j ) γ ( fig. S5A ). We assume that g S j is constant in each city, written as g city , and a S j ≈ a S =ā · N max /N withā denoting the average block area in the city. Then we can rewrite Eq. 3 as where p(N ) denotes the share of population in blocks with facilities; A and λ are both constant. More details of this derivation can be found in note S4.1. We further study how the share of population in blocks without facilities is related to the number of facilities N , and find 1 − p(N ) ≈ e −αN when N N occ (see details in fig. S5B , notes S4.2 and S4.3). Thereby, we could model L as where the number of facilities N is the main variable that determines L. While α controls the relation between p(N ) and N ; A and λ are two free parameters to calibrate. The model of L(N ) summarizes the fact that to model L the only two essential ingredients are the number of facilities to allocate N and the distribution of population in space. Next, we numerically assign the optimal distribution of facilities given varying number of facilities for both toy and real-world cities. We present the average travel distance L versus the number of facilities N in the toy and real-world cities in log-log plots in Figs For seeking a universal function to approach the simulated L in diverse cities, we use λ in Eq. 5 as a constant, fixing its average empirical valueλ = 0.382 in the 12 real-world cities. Combining the observation that N max is inversely proportional to α and A ≈ g city a λ N λ max (note S4.1), we can expect that A ∝ α −λ . Figure S7C confirms this, showing that A = 1.4443α −λ . We can rewrite Eq. 5 as follows: This function with only one free parameter α suggests that we are able to rescale N with α to collapse the curves of L in all cities into one, as shown in Fig. 5F that depicts Eq. 6 as solid line. The same rescaling of N in toy cities is presented in Fig. 5E , where the collapse is not as good as in the real cities due to the divergent values of λ of toy cities in table S2. Next, we go beyond the average distance L and plot the distribution of travel distances when keeping αN fixed (figs. S7F and S7G). In all cases the travel distance follows a Gamma distribution. This universality suggests that: (i) given a certain αN , all real-world cities can reach comparable accessibility; (ii) the overall accessibility in the optimal scenario not only depends on the availability of the resources but also the settlement of population, independently from the road network and total area of the city. Empirically, the decay of population share in blocks without facilities α depends on the population distribution in space. Taking into account that unpopulated blocks are not ideal when optimizing accessibility, N occ is a better variable to express α. A good agreement α=1.833/N occ (R 2 =0.96) over the 12 real-world cities is shown in Fig. 5G , suggesting that α can be estimated by N occ . Given that α=1.833/N occ and the universal relation of L(αN ), we can explain the collapses found in Figs. 3C, 4C and 4D. As a concrete application of this universal model for optimal distance of facilities, in Eq. 6, we can plan for facilities by, for example, extracting how many facilities are needed for varying levels of accessibility to a given type of service. In this context, the number of facilities N can be estimated with the inverse function of Eq. 6. As the second term in Eq. 6 dominates the L for a limited N , we simply invert the second term to estimate N , given by N (L; α) = planning. This became ever more evident when distributing the healthcare system resources during the outbreak of a pandemic, such as the COVID-19 in 2020. Population density The population with a spatial resolution of 30 arc-seconds (approximately 1 km 2 near the equator) of each city was obtained from the LandScan (38) S1 . We select 10 types of facilities to inspect their actual and optimal planning and they are given in the legend of Fig. 1 . After merging facilities of the same type located in the same block, the occupancy of each type of facility, N/N occ , are presented in Table 1 . The U.S. cities generally have more dense facilities than the GCC cities. Road networks We extract the road networks from OpenStreetMap (39) . The road network is represented as a directed graph, in which edges indicate road segments and nodes indicate intersections. Each edge is associated with a weight representing its length. The travel distance between two blocks is computed by finding the shortest path between two randomly selected nodes in these blocks using Dijkstra algorithm (55) . Finding the optimal locations of facilities to minimize the total travel cost is essentially an optimal placement problem in network theory, which is NP-hard and known as p-median problem. The problem in this work is formalized as follows: "Given a set of blocks N max in a city, a set of residential blocks X ∈ N max are with population, and each block in N max can only accommodate one facility. The goal is to open N facilities in N max so as to minimize the sum of population-weighted travel distances from each residential block to its nearest open facility." (56) For simplicity, the p-median problem is written as a linear programming problem. where i and j are indices of the blocks; x i,j =1 means that people living in block i are assigned to their nearest facility in block j, and i=j signifies that there is a facility located in residential block i; y j =1 if there is a facility in block j, else y j =0; N is the number of facilities to assign and we assume that one block can only accommodate one facility of the same type; c i,j is the travel cost from block i to block j, which equals to the total routing distance of all population residing in block i. In this work, we solve the p-median problem with a fast algorithm based on swap-based local search procedure implemented by Resende and Werneck (51) . We adopt the UCI proposed by Pereira et al. to measure the centrality of the population distribution in cities (54) . UCI is the product of two components, the location coefficient (LC) and the proximity index (PI). The former is introduced to measure the inhomogeneity of population distribution in space. The latter is introduced to measure the difference between the current distribution and the most decentralized scenario. The calculation of LC and PI are as follows. where V = S × D × S. S is a vector of population fraction in block i, s i =p i /P , signifying the share of population in block i (p i ) of the total population of the city (P ); D is the distance matrix between blocks. V max is calculated by assuming the total population are uniformly settling on the boundary of the city, which indicates an extreme sprawl. UCI ranges from 0 to 1. Large UCI values indicate more centralized population distributions. Note S1: List of variables and notations for the accessibility analysis. Note S2: Data description. Note S3: On the optimality index. Note S4: Derivation of the average optimal travel distance. Table S1 . Best fitted exponent of the power law for the actual and optimal distribution of facilities. Table S2 . Fitted parameters of the 17 toy cities and the 12 real cities. References (57) (58) (59) (60) (61) . Modelling urban growth patterns Urban characteristics attributable to density-driven tie formation Growth, innovation, scaling, and the pace of life in cities A unified theory of urban living Cities, productivity, and quality of life The origins of scaling in cities A global map of travel time to cities to assess inequalities in accessibility in 2015 The area and population of cities: new insights from a different perspective on cities City size, network structure and traffic congestion Professional diversity and the productivity of cities The scaling of human interactions with city size Heterogeneity and scale of sustainable development in cities Simple spatial scaling rules behind complex cities Power laws, Pareto distributions and Zipf's law From global scaling to the dynamics of individual cities The statistical physics of cities Optimal design of spatial distribution networks Scaling laws between population and facility densities An accessibility-based integrated measure of relative spatial equity in urban public facilities Measuring the accessibility of services and facilities for residents of public housing in Montreal Do poorer people have poorer access to local resources and facilities? The distribution of local resources by area deprivation in Glasgow Is inequality in the distribution of urban facilities inequitable? Exploring a method for identifying spatial inequity in an Iranian city Toward cities without slums: Topology and the spatial evolution of neighborhoods Discovering latent patterns of urban cultural interactions in wechat for modern city planning Accessibility and public facility location Spatial optimization of residential care facility locations in Beijing, China: maximum equity in accessibility A data science framework for planning the growth of bicycle infrastructures Cities as organisms: Allometric scaling of urban road networks How congestion shapes cities: from mobility patterns to scaling From mobile phone data to the spatial structure of cities Measuring the impact of economic well being in commuting networks-A case study of Bogota, Colombia Emergence of hierarchy in cost-driven growth of spatial networks Encapsulating urban traffic rhythms into road networks Understanding congested travel in urban areas Elementary processes governing the evolution of road networks A typology of street patterns Modeling the polycentric transition of cities OpenStreetMap The path most traveled: Travel demand estimation using big data resources Collective benefits in traffic during mega events via the use of information technologies Mining urban deprivation from foursquare: Implicit crowdsourcing of city land use Using foursquare place data for estimating building block use Communicating through location: The understood meaning of the foursquare check-in Semantic enrichment of movement behavior with foursquarea visual analytics approach Unraveling environmental justice in ambient PM2.5 exposure in Beijing: A big data approach An accessibility-maximization approach to road network planning Spatial networks Network structure and city size A fast swap-based local search procedure for location problems Geographic distribution of public health hospitals in Riyadh, Saudi Arabia Urban centrality: a simple index A note on two problems in connexion with graphs Theory in practice Fractal Cities: A Geometry of Form and Function Location and land use Urbanization and Urban Problems Cities and Housing Urban population densities This work was supported by the QCRI-CSAIL, the Berkeley DeepDrive (BDD) and the University of California Institute of Transportation Studies (UC ITS) research grants. YX, LEO, SA and MCG conceived the research and designed the analyses. YX and SA collected the data. YX and LEO performed the analyses. MCG and YX wrote the paper. MCG supervised the research. The authors declare that they have no competing interests. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.