key: cord-0188916-kmbku7r0 authors: Bonato, Pietro; Cintia, Paolo; Fabbri, Francesco; Fadda, Daniele; Giannotti, Fosca; Lopalco, Pier Luigi; Mazzilli, Sara; Nanni, Mirco; Pappalardo, Luca; Pedreschi, Dino; Penone, Francesco; Rinzivillo, Salvatore; Rossetti, Giulio; Savarese, Marcello; Tavoschi, Lara title: Mobile phone data analytics against the COVID-19 epidemics in Italy: flow diversity and local job markets during the national lockdown date: 2020-04-23 journal: nan DOI: nan sha: 36ec911551c7e65d237778253f26fab664df878b doc_id: 188916 cord_uid: kmbku7r0 Understanding collective mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in stand-by to fight the diffusion of the epidemics. In this report, we use mobile phone data to infer the movements of people between Italian provinces and municipalities, and we analyze the incoming, outcoming and internal mobility flows before and during the national lockdown (March 9th, 2020) and after the closure of non-necessary productive and economic activities (March 23th, 2020). The population flow across provinces and municipalities enable for the modelling of a risk index tailored for the mobility of each municipality or province. Such an index would be a useful indicator to drive counter-measures in reaction to a sudden reactivation of the epidemics. Mobile phone data, even when aggregated to preserve the privacy of individuals, are a useful data source to track the evolution in time of human mobility, hence allowing for monitoring the effectiveness of control measures such as physical distancing. We address the following analytical questions: How does the mobility structure of a territory change? Do incoming and outcoming flows become more predictable during the lockdown, and what are the differences between weekdays and weekends? Can we detect proper local job markets based on human mobility flows, to eventually shape the borders of a local outbreak? Understanding human mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in "stand-by" to fight the diffusion of the epidemics. A recent analysis shows that, following the national lockdown of March 9th, the mobility fluxes have decreased by 50% or more, everywhere in the country [13] . In this report, To this purpose, we use mobile phone data to compute the movements of people between Italian provinces, and we analyze the incoming, outcoming and internal mobility flows before and during the national lockdown (March 9th, 2020) and after the closure of non-necessary productive and economic activities (March 23th, 2020). The population flow across provinces and municipalities enable for the modeling of a risk index tailored for the mobility of each municipality or province. Such an index would be a useful indicator to drive counter-measures in reaction to a sudden reactivation of the epidemics. Mobile phone data, even when aggregated to preserve the privacy of individuals, are a useful data source to track the evolution in time of human mobility [8, 9] , hence allowing for monitoring the effectiveness of control measures such as physical distancing [4, 5, 6] . In this report, we address the following analytical questions: How does the mobility structure of a territory change? Do incoming and outcoming flows become more predictable during the lockdown, and what are the differences between weekdays and weekends? Can we detect proper local job markets based on human mobility flows, to eventually shape the borders of a local outbreak? An interactive version of this report will be available at http://sobigdata.eu/covid_report Page 4 The raw data used in this report are the result of normal service operations performed by the mobile operator WINDTRE 1 : CDRs (Call Detail Records) and XDRs (eXtended Detail Records). In both cases, the fundamental geographical unit is the "phone cell" defined as the area covered by a single antenna, i.e., the device that captures mobile radio signals and keeps the user connected with the network. Multiple antennas are usually mounted on the same tower, each covering a different direction. The position of the tower (expressed as latitude and longitude) and the direction of the antenna allow inferring the extension of the corresponding phone cell. The position of caller and callee is approximated by the corresponding antenna serving the call, whose extension is relatively small in urban contexts (in the order of 100m x 100m) and much larger in rural areas (in the order of 1km x 1km or more). Based on this configuration, CDRs describe the location of mobile phone users during call activities and XDRs their location during data transmission for internet access. The information content provided by standard CDR and XDR is the following: In both CDRs and XDRs, the identity of the users is replaced by artificial identifiers. The correspondence between such identifiers and the real identities of the users is known only to the mobile phone operator, who might use it in case of necessity. This pseudonymization procedure is a first important step (mentioned in Article 6(4) and Article 25(1) of the GDPR, the EU General Data Protection Regulation) to provide anonymity [7, 10, 11] and it will then turn into totally anonymous data for the possible treatment data use. For the analyses in this report, we used aggregated data computed by the mobile operator covering the period February 3rd, 2020 to March 28th, 2020. For each phone call, a tuple is recorded, where n o and n i are pseudo-anonymous identifiers, respectively of the "caller" and the "callee"; t is a timestamp saying when the call was placed; A s and A e are the identifiers of the towers/antennas to which the caller was connected at the start and end of the call; finally, d is the call duration (e.g., in minutes). They are similar to CDRs, except that the communication is only between the antenna and the connected mobile phone, and an amount k of kilobytes is downloaded in the process. The format of XDR is, therefore, a tuple . Records (XDR) 1 WINDTRE is one of the main mobile phone operators in Italy, covering around 32% of the residential "human" mobile market. Page 5 CDRs and XDRs are aggregated into daily municipality-to-municipality origin-destination (OD) matrices: there is an OD matrix per each day, and each element OD A, B of the matrix describes the total number of trips from municipality A to municipality B. The presence of two consecutive points of a user in different municipalities indicates a movement, which is counted as a trip if the user stays in the destination municipality for at least one hour, and discarded otherwise. For a better matching with public COVID-19 data, we aggregated the municipality-to-municipality ODs into province-to-province ODs, in which each node represents an Italian province. The trips between municipalities of the same province have been aggregated into a self-loop, which indicates the province's internal mobility. As they are calculated by the operator, we store the daily municipality-to-municipality OD matrices and the daily province-to-province ones into a relational DBMS and access them through calls to a dedicated API. Figure 1 visualizes the out-flows and in-flows of the province of Padua (region of Veneto, north-east of the country), for February 18th (before the lockdown, on the left) and March 24th (during the lockdown, on the right). The chart shows the flows among provinces with a stroke width proportional to the flows. The out-flows (in-flow) are first linked to the corresponding region and then to the final destination (origin). During the lockdown, we observe a drastic reduction of both the in-and the out-flows (reported on labels in the corresponding circle), as well as a reduction of the number of provinces the flows are coming from or are directed to. The reduction in the number of provenances and destinations is also evident in the other provinces of the country. For example, Figure 2 shows this pattern is even more pronounced for the province of Bari, in the region of Puglia in the south-east of the country. 0.14 7% 0.14 7% 2 .1 0 9 % 2 .1 0 9 % 0 . 5 9 6 % 0 . 5 9 6 % 9 1 .9 7 0 % 9 1 .9 7 0 % ( 1 8 . 9 7 9 % ) T a r a n t o ( 1 8 . 9 7 9 % ) T a r a n t o The two vertical lines indicate the dates of the national lockdown (March 9th) and the closure of non-necessary productive and economic activities (March 23th). We observe a significant decrease in the volume of flows after the national lockdown, while we do not observe a comparable decrease soon after the closure of non-necessary activities. where P in is the number of provinces with non-null flow to province A, p(x) is the probability that the in-flow to province A comes from province x, and log(N) is a normalization factor where N=110 is the number of Italian provinces. The out-flow diversity of province A is computed similarly as: where P out is the number of provinces with non-null flow from province A, and p(x) is the probability that the out-flow from province A goes to province x. The horizon charts in Figure 4 show the evolution of the in-and out-flow diversity for the four selected provinces, while those in Figure 5 refer to 30 provinces chosen randomly. The vertical axis lines represents time, each rectangle section has a color proportional to the displayed measure (darker color for larger value). The circles on the left have an area proportional to the number of confirmed COVID-19 cases in the corresponding province up to March 24th. We find a progressive reduction of both the in-and out-flow diversity as time goes by, with an acceleration of the process soon after the beginning of the national lockdown (March 9th). Before the lockdown, the in-and out-flow diversities are slightly higher at the weekends than the weekdays. The opposite is true during the lockdown: the in-and out-flow diversities are considerably lower at the weekends than the weekdays. This exciting result suggests that: (i) the provenience and the destination of a province's mobility flows during the lockdown are more predictable than before the lockdown; (ii) regarding the weekends, the provenience and destination of flows are more diverse before the lockdown than during it. BERGAMO Horizon chart that describes the evolution in time of the in-and out-flow diversity of the provinces of Bergamo, Padua, Bari and Catania. The circles on the left have an area proportional to the number of confirmed COVID-19 cases in the corresponding province up to March 24th. Horizon charts compact the area chart by slicing it horizontally, and then shifting the slices to baseline zero. Black solid vertical lines indicate the dates of the national lockdown (March 9th) and the closure of non-necessary productive and economic activities (March 23th). The white dashed vertical lines indicate Sundays. Note that, while the in-and out-flow diversities slightly increase in the weekends before the lockdown, they decrease in the weekends during the lockdown. Flow diversity and local job markets during the national lockdown MOBILE PHONE DATA ANALYTICS AGAINST THE COVID-19 EPIDEMICS IN ITALY Page 13 Horizon chart that describes the evolution in time of the in-flow diversity of 30 provinces (out of 110) chosen at random. Solid vertical lines indicate the dates of the lockdown (March 9th) and the closure of non-necessary economic activities (March 23th). The dashed vertical lines indicate Sundays. We observe an interesting pattern for weekends: while flow diversity slightly increases with respect to weekdays before the lockdown, it decreases during the lockdown. We use the k-means clustering algorithm to discover k groups of similar provinces in terms of their evolution of in-and out-flow diversity. To find the best value of k, we repeat the algorithm for k = 2, ..., 20. For both the in-and outflow diversities, we find that k = 5 minimizes the within-cluster distance. Figure 6 shows the centroids of the five clusters of in-flow diversity. Although the clusters' trends are similar, they have different typical in-flow diversities. We provide in the Appendix the figure that shows the clusters' centroids regarding the evolution of the out-flow diversity. Figure 7 visualizes the evolution of the in-flow diversity for all the provinces in cluster 4, the one with the highest typical in-flow diversity. We provide in the Appendix the horizon charts regarding the other four clusters. Figure 9 shows the local job markets we found in Puglia (a region on the southeast of the country) before the lockdown (up) and during the lockdown (down). Note the fragmentation of the territory during the lockdown, especially for the easternmost and the westernmost parts of the region. Local job markets in Puglia (a region in the south-east of the country) before the lockdown (left) and during the lockdown (right). In our first report of the analysis of mobility flows using mobile phone data up to March 28th, 2020, we find several interesting results. First, regarding the volume of in-, out-and self-flows between provinces, we find a significant decrease after the national lockdown (March 9th). Still, we do not find any significant decrease soon after the closure of the non-necessary productive activities. Regarding the in-and out-flow diversities, we find that while there is a slight increase in the flow diversity on the weekends before the lockdown, there is a strong decrease of the flow diversity on the weekends during the lockdown. Moreover, the application of data mining techniques reveals the presence of five main clusters of provinces. Finally, we use a community detection algorithm to find local job markets in Italy. We observe a striking increase in the number of communities during the lockdown and a slight increase after the closure of non-necessary activities. This suggests that reduced mobility split the territory into more and smaller local job markets. This information may be exploited by decision-and policy-makers to plan "phase 2" of the management of the epidemics. In the next report, we will investigate deeply how the structure of the OD matrices evolve in time, and we will extend the period of observation to the most recent days. We will also focus our analysis on some specific regions, considering the evolution of the epidemics at a municipality level. We will compare the impact of mobility reduction to the outbreak, answering several analytical questions: What is the virus-spreading effect generated by late-February north-south flows? How large should a "red zone" be to reduce effectively the spread of the epidemic? Evolution of the out-flow diversity of the five clusters' centroids. The area around the line indicates the deviation of the provinces from the centroid. Note that, though the clusters have similar trends, they have different typical out-flow diversities. in-flows, indicating the total number of people moving to the province from any other province in Italy on that day LOCAL JOB MARKETS -Economic activities are linked by input-output relationships, with interconnected supply chains that are difficult to isolate. Local Job Markets (LJM) take into account the shifts between the home-work displacements (commuting) that occur between different municipalities. Each LJM is partially isolated from the others This can be done by analyzing the municipality-to-municipality OD matrices as weighted directed graphs [1] and using a community detection algorithm [2] to discover a collection of well-bounded mesoscale topologies, e.g., municipality clusters. Note that community detection algorithms can provide different results depending on their definition of there has been a striking increase in the number of communities, indicating that people moved within smaller areas. For example, note that on Monday Discovering the geographical borders of human mobility Community detection in graphs An informationtheoretic framework for resolving community structure in complex networks Mobile phone data and COVID-19: Missing an opportunity Aggregated mobility data could help fight COVID-19 Measuring Levels of Activity in a Changing City: A Study Using Cellphone Data Streams On the privacy-conscientious use of mobile phone data A survey of results on mobile phone datasets analysis Returners and explorers dichotomy in human mobility PRIMULE: Privacy risk mitigation for user profiles A Data Mining Approach to Assess Privacy Risk in Human Mobility Data An analytical framework to nowcast well-being using mobile phone data COVID-19 outbreak response: first assessment of mobility changes in Italy following lockdown