key: cord-0468008-dp6sq4nq authors: Fontanelli, Oscar; Guzm'an, Plinio; Meneses, Am'ilcar; Hern'andez, Alfredo; Flores-Garrido, Marisol; Hern'andez-Rosales, Maribel; Anda-J'auregui, Guillermo de title: Intermunicipal Travel Networks of Mexico (2020-2021) date: 2022-03-25 journal: nan DOI: nan sha: 74682963876a26552867e7f4dc9b592c97523538 doc_id: 468008 cord_uid: dp6sq4nq We present a collection of networks that describe the travel patterns between municipalities in Mexico between 2020 and 2021. Using anonymized mobile device geo-location data we constructed directed, weighted networks representing the (normalized) volume of travels between municipalities. We analysed changes in global (graph total weight sum), local (centrality measures), and mesoscale (community structure) network features. We observe that changes in these features are associated with factors such as Covid-19 restrictions and population size. In general, events in early 2020 (when initial Covid-19 restrictions were implemented) induced more intense changes in network features, whereas later events had a less notable impact in network features. We believe these networks will be useful for researchers and decision makers in the areas of transportation, infrastructure planning, epidemic control and network science at large. Intermunicipal mobility is a kind of medium and large scale mobility within a country where millions of individuals daily travel from one county or municipality to another, either going from home to work, shopping, accessing public services, cargo loading, vacation, etc. These movements and travels generate complex structures and dynamics of socio-economic interactions between different areas both at regional and national levels. Given the nature of these mobility systems, complex networks have been widely adopted to model this commuting phenomena [6] [10] [23] [25] [35] . Characterizing and understanding the properties of these mobility networks is crucial for decision-making, urban planning, traffic engineering and, as has become clear with the Covid-19 pandemic, designing, implementing and evaluating mobility restrictions and lockdowns in order to contain or control the epidemic spread [3] [17] [26] [34] [38] [40] [45] . Therefore the need for public mobility network datasets, aiming to researchers and decision-makers in the areas of mobility, urban planning, epidemic control, etc. Traditionally these networks had been obtained from mobility surveys. However, the emergence in recent years of cell phone and GPS data have facilitated the acquisitions of accurate and large sets of human mobility data, thus allowing the construction and characterization of large and detailed human mobility networks. Here we introduce a new public dataset of daily intermunicipal origin-destination networks in Mexico for 2020 and 2021, which were directly constructed from large datasets of geolocation data. With this dataset we hope to contribute to research and decision-making communities from diverse interests, from pure network theory to those studying human mobility, urban planning, national scale social and economic relations, epidemic control, etc. As far as we know, this is the first public dataset of its kind for a country in Latin America. There are other works that have built origin-destination networks aiming to describe mobility patterns, measure the volume of public transportation and to plan public transportation, among other things. These Table 1 : Set of dates for the analysis. We chose these dates to be either Monday or the Monday closest to the referred event. All of these make reference to events in Mexico. mobility patters include large-scale and long-range commuting patterns [20] [36] ; spatio-temporal patterns for different socioeconomic strata [25] ; patterns in bike-sharing systems [24] , etc. Origin-destination matrices have also been used to measure the volume of use of public transport [2] [44] [28] and public transport planning [29] [1] . There are several approaches for modeling and generating origin-destination matrices. These approaches include gravity models [42] [39] [16] ,Bayesian model [31] [32] [43] , linear assignment matrix approximation [41] , Principal Components Analysis [12] and gradient approximation method [18] , among others. Origin-destination networks have been elaborated from different types of data. Most of these works use data from the different public transport systems or from road side interview [42] . In works such as [2] and [28] authors use smart card fare data to estimate origin-destination networks. In [25] the data considered for these matrices are those of the bike-shared system. In [44] authors conduct experiments on two datasets generated by ride-hailing applications. In [41] , authors estimate the matrices using data from traffic counts on the network links. Recently [19] or another data set from mobile phone data such ACAPS dataset or SafeGraph [37] [7] [11] . In particular, Edsberg et al. [15] show that origin-destination networks based on data from cell phones and social networks present high-quality results. This article is organized as follows: Results presents and describes the collection of mobility networks and it shows analysis of changes in global (sum of weights), local (centrality measures), and mesoscale (community structure) network features. In Discussion we argue how events in early 2020 (when initial Covid-19 restrictions were implemented) induced more intense changes in network features, whereas later events had a less notable impact in network features. In Methods we describe the methodology we utilized to collect data and show the algorithm for network construction. We release a public dataset of 731 intermunicipal origin-destination networks in Mexico. These networks were constructed from a large and anonymized mobile location dataset (see [9] for more information about this dataset). Each network is the intermunicipal origin-destination network in Mexico for each day during the 2020-2021 period. Nodes represent municipalities (third level administrative division) or official metropolitan zones, see Methods section below. These are weighted and directed networks, where the weight of edge a ij is equal to the total number of observed travels from node i to node j normalized by the different number of mobile devices we recorded on that day. The data set with these 731 networks is freely available in a OSF repository http://dx.doi.org/10.17605/OSF.IO/42XQZ. For analysis and visualization purposes we chose nine representative dates capturing different important events during the evolution of the pandemic in Mexico; these are shown in Table 1 . As a first visualization, we show in Fig. 1 mobility networks over the Mexico map for this set of dates, marking only 1% of edges with highest weight. Changes in graph total weight sums were observed throughout the first two years of the pandemic As a measure to quantify the total observed movement in each network of the collection, we consider the total sum of weights in the network S G = w i , where i runs over all nodes of the network. In our context, a higher value of S G can be understood as higher mobility between municipalities in the country, which, in turn, is associated with people's decisions to move outside their locality. Fig. 2 shows the time series related to the S G parameter, also indicating the set of dates described in Table 1 and official school vacation periods (summer breaks, winter breaks and Easter holidays, shaded in gray). It can be observed that the decay in mobility that happens in January 2020, probably due to being the post-holiday season, is prolonged after the start of the lockdown, reaching a local minimum point shortly before the beginning of the summer holiday season. Following summer break mobility continues to decay, reaching its lowest point again shortly before the beginning of winter break, when there was a pronounced mobility rise. For the first half of 2021 we observe a sustained rise in mobility until July 2020, when it reaches a relatively high plateau. Since the networks exhibit a different mobility pattern each day, centrality measures associated with each node (locality) also change. In this section we explore the variability of centrality measures over the period we studied. We choose three different centrality measures: (a) degree centrality, (b) node In order to explore the variability observed in node centrality measures with the population of the area they represent, the relationship between the three centrality measures considered and the population reported in the 2020 national census was analyzed https://www.inegi.org.mx/programas/ccpv/2020/ default.html#Datos_abiertos. Table 1 shows estimated parameters when fitting a linear regression model, while Fig. 6 shows the identified relationship on a logarithmic scale. Formation and evolution of communities in the network. Recalling that a community in a network is a set of nodes with a larger density of connections between them than external to the set, communities in these mobility networks correspond to groups of municipalities or with a high internal (within the group) mobility and a relative low external mobility (from the region to the outside or the other way around). Community detection algorithms in mobility or commuting networks have been widely utilized to detect or delimitate geographical functional regions [13] [14] [21] [22] [27] . We utilized the label-propagation algorithm to detect communities in our networks. Alluvial plot in Fig. 7 shows in schematic way the time evolution of community structure on the networks. In this diagram each line represents a municipality in Mexico and they are grouped according to the network community to which they belong. We show this structure for the chosen set dates. Color of each line represents the state to which they belong. Sum of total (in and out) weights over all edges of these mobility networks is a gross indicator of the total intermunicipality mobility that we observe on each day. We can see in Fig 2 how there is a pre-pandemic peak at late February -early March 2020 (first confirmed case in Mexico was on February 24, 2020). This peak is followed by a persistent decay, whose rate does not seem to be affected by the beginning of lockdowns. In fact, this decay continues until June-July of 2020, when we observe a peak that somewhat coincides with official summer break. After summer break we still see a downward trend until winter break and, agian, we see a peak that coincides with official vacation period. In late February 2021 we observe the lowest point of mobility and from here there is sustained upward train, apparently unaffected by Easter holidays until it reaches a new maximum that is even higher than the pre-pandemic peak. At this point mobility reaches a high plateau and, even though mobility still fluctuates, these fluctuations does not seem to be correlated neither with the pandemic third wave nor with vacation periods. From Figure 8 : Alluvial plots for network community inside four states in Mexico. Each line represents a municipality or metropolitan zone, they are coloured according to their initial community and they are grouped according to the community to which they belong. July 2021 total intermunicipal mobility stays at high levels, comparable with pre-pandemic levels. Centrality measures evolve over time. How they change seem to be weakly associated to the evolution of the pandemic. We also observe that not all nodes exhibit the same behavior. While some nodes, such as Guadalajara, Puebla-Tlaxcala and Valle de Mexico show larger fluctuations in their degree centrality, other nodes such as San Pedro Topiltepec or Morelia exhibit much flatter time series. Changes in betweenness centrality for Valle de Mexico are very different to those for the other nodes. Evolution for degree and strength in Valle de Mexico seem to be correlated, but they differ from evolution of betweenness. A relatively simple intuition is that municipalities with larger populations will have higher values of centrality measures regardless of time. We do see that all of the analysed centrality measures are correlated with population size. However, we observe that this correlation scales in the same way; Some of these metrics show a better lineal correlation with population on lineal scale while some other show a better lineal correlation on logarithmic scale (see Appendix for model fits). A related question is whether variations in these centrality measure are also correlated with population size. We observe a negative correlation between coefficient of variation and population, showing that centrality variation is higher in smaller municipalities. Again, the best fit for this correlation is not necessarily found in the linear scale (see Appendix). In any case, it should be noted that while population size is clearly a relevant factor, the dispersion of the point cloud in the range of midsized municipalities indicate other factors that may be in play to fully explain centrality measures in these networks. Regarding formation of communities in the network, we observe that, in a very broad way, for each day in this analysis there is a "giant" community which includes about half or more of the nodes in the network and that this structure seems to preserve for all dates (see Fig. 7 . Therefore, if there are changes in the community structure of inter-municipal mobility, these have to occur at more local levels. In this diagram lines (nodes in the network) are colored according to their state. For example, all light-blue lines starting at the bottom-left corner of the diagram correspond to Oaxaca municipalities, while green lines just above are Puebla municipalities. Notice here how all nodes in Puebla tend to stay inside the same community (the largest one), but municipalities in Oaxaca move to different communities, indicating a change in the structure of inter-municipal level mobility within this state. Alluvial plot on Fig. 7 shows community structure and its time evolution for the hole country. We show in Fig. 8 local versions of these plots for four different states. Campeche and Nuevo León are among top three states in the country for Gross Domestic Product per capita (the other being Mexico City, which is only one node in this analysis, therefore it lacks true intra-state community structure, since all municipalities are collapsed in the same node); on the other hand, Oaxaca and-Chiapas are the two states with the lowest GDP per capita. We used mobile device location data for the time period between 2020-01-01 and 2021-12-31 within Mexican territory provided by Veraset, a company that aggregates anonymized mobile device location data. This source dataset is provided as a table in which each record (called a ping) contains the position (latitude and longitude) of a given (anonymized) device for a given timestamp (with temporal resolution up to seconds). The set of all unique device ids for a given day is called the device panel. We define an Intermunicipal Travel Network (IMTN) as a directed, weighted graph G(V,E), for a given day, such that: • nodes represent localities (either municipalities or metropolitan zones, as defined by the national geographic agency; see Note 1) • links represent mobility from the source node to the target node, defined by observing at least one device that moved from node i to node j. • Link weights represent the total fraction of observed devices that moved from node i to node j, out of the total number of observed devices; this acts as a normalized measure of flow between nodes (see Note 2). The political division of Mexico has municipalities as the smallest unit. Generally, a population center is contained within a municipality; however, there are large urban areas in which a single population center extends through many different municipalities, such that the movement between municipal boundaries is capturing the urban mobility and not travel between different locations. The National Geography and Statistics Institute (INEGI) defines 74 metropolitan areas in Mexico, based on measurements from 2015 https://www.inegi.org.mx/contenido/productos/prod_serv/contenidos/espanol/bvinegi/ productos/nueva_estruc/702825006792.pdf. Since intracity mobility is beyond the scope of this manuscript (and would greatly skew the mobility metrics, as the volume of intra-city mobility is way larger than that of true travels between different locations), we decided to aggregate the municipalities that form these metropolitan areas into single nodes in the network. We defined edge weights in the network as follows: where |D ij | is the number of devices that were observed to move from i to j (that is, were observed in i and their next immediate ping was in j) and |D| is the total number of devices in the day's device panel. In this way, the weight represents a normalized measure of flow between regions; we may observe that in limit cases, the weight will be zero when there is no movement observed from one municipality to the other, and the weight would be 1 if all observed devices within the country travelled from region i to region j (which would be a virtually impossible scenario). An advantage of using this approach is that it controls variability in the number of observed devices each day, allowing for comparisons between days. The following pseudocode shows the the algorithm used for network construction, starting with the daily mobile device location data Networks were analyzed using the igraph library, version 1.2.7 [8] , for the R programming language, version 4.1.0. The collection of 731 intermunicipal networks is publicly available on a OSF repository http://dx.doi. org/10.17605/OSF.IO/42XQZ. 2020-06-01 Beginning of Epidemiological Stoplight Program. 2020-07-30 First national peak of daily contagions. 2020-09-21 Local minimum of daily contagions (between first and second wave). 2021-01-19 Second national peak of daily contagions. 2021-05-24 Local minimum of daily contagions (between second and third wave) The value of additional data for public transport origin-destination matrix estimation Use of smart card fare data to estimate public transport origin-destination matrix Mapping the intercounty transmission risk of covid-19 in new york state, Available at SSRN 3582774 Economic and social consequences of human mobility restrictions under covid-19 Origin-destination estimation using mobile network probe data Accessibility and complex network analysis of the us commuting system Mobility network models of covid-19 explain inequities and inform reopening The igraph software package for complex network research The contact and mobility networks of mexico city Modeling commuting systems through a complex network analysis: A study of the italian islands of sardinia and sicily Network percolation reveals adaptive bridges of the mobility network response to covid-19 Efficient real time od matrix estimation based on principal component analysis Comparison of two network-theory-based methods for detecting functional regions, Business Systems Research: International journal of the Society for Advancing Innovation and Research in Economy Delineating metropolitan areas: Measuring spatial labour market networks through commuting patterns Understanding components of mobility during the covid-19 pandemic Estimating origin-destination matrix of bogor city using gravity model Optimal lockdown in a commuting network New gradient approximation method for dynamic origin-destination matrix estimation on congested networks Human mobility in response to covid-19 in france, italy and uk Unveiling large-scale commuting patterns based on mobile phone cellular network data Demarcating geographic regions using community detection in commuting networks with significant self-loops Hierarchical community detection and functional area identification with osm roads and complex graph theory A universal model of commuting networks Human mobility in bike-sharing systems: Structure of local and non-local dynamics Rich do not rise early: spatio-temporal patterns in the mobility networks of different socio-economic classes Heterogeneous impact of a lockdown on inter-municipality mobility Regional delineation of china based on commuting flows Estimation of a disaggregate multimodal public transport origindestination matrix from passive smartcard data from santiago, chile Origin-destination matrix generation using smart card data: Case study for izmir García-Palomares, Social media and urban mobility: Using twitter to calculate home-work travel matrices A bayesian approach for modeling origin-destination matrices A dynamic hierarchical bayesian model for the estimation of day-to-day origin-destination flows in transportation networks Trip distribution modeling with twitter data, Computers, Environment and Urban Systems Population mobility reductions during covid-19 epidemic in france under lockdown Using the weighted rich-club coefficient to explore traffic organization in mobility networks Networks and long-range mobility in cities: A study of more than one billion taxi trips in new york city Covid-19 lockdown induces disease-mitigating structural changes in mobility networks Commuting network spillovers and covid-19 deaths across us counties A universal model for mobility and migration patterns Quantifying the influence of inter-county mobility patterns on the covid-19 outbreak in the united states Estimation of dynamic origin-destination matrices using linear assignment matrix approximations Origin-destination trip matrix development: Conventional methods versus mobile phone data Sensor location model to optimize origin-destination estimation with a bayesian statistical procedure Origin-destination matrix prediction via graph convolution: a new perspective of passenger demand modeling Covid-19 spread and inter-county travel: Daily evidence from the us, Transportation Research Interdisciplinary Perspectives