key: cord-0120044-hk55bzqm authors: Cintia, Paolo; Fadda, Daniele; Giannotti, Fosca; Pappalardo, Luca; Rossetti, Giulio; Pedreschi, Dino; Rinzivillo, Salvo; Bonato, Pietro; Fabbri, Francesco; Penone, Francesco; Savarese, Marcello; Checchi, Daniele; Chiaromonte, Francesca; Vineis, Paolo; Guzzetta, Giorgio; Riccardo, Flavia; Marziano, Valentina; Poletti, Piero; Trentini, Filippo; Bella, Antonino; Andrianou, Xanthi; Manso, Martina Del; Fabiani, Massimo; Bellino, Stefania; Boros, Stefano; Urdiales, Alberto Mateo; Vescio, Maria Fenicia; Brusaferro, Silvio; Rezza, Giovanni; Pezzotti, Patrizio; Ajelli, Marco; Merler, Stefano title: The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy date: 2020-06-04 journal: nan DOI: nan sha: 64173741d028f809258828993efe2e766c7323b8 doc_id: 120044 cord_uid: hk55bzqm We describe in this report our studies to understand the relationship between human mobility and the spreading of COVID-19, as an aid to manage the restart of the social and economic activities after the lockdown and monitor the epidemics in the coming weeks and months. We compare the evolution (from January to May 2020) of the daily mobility flows in Italy, measured by means of nation-wide mobile phone data, and the evolution of transmissibility, measured by the net reproduction number, i.e., the mean number of secondary infections generated by one primary infector in the presence of control interventions and human behavioural adaptations. We find a striking relationship between the negative variation of mobility flows and the net reproduction number, in all Italian regions, between March 11th and March 18th, when the country entered the lockdown. This observation allows us to quantify the time needed to"switch off"the country mobility (one week) and the time required to bring the net reproduction number below 1 (one week). A reasonably simple regression model provides evidence that the net reproduction number is correlated with a region's incoming, outgoing and internal mobility. We also find a strong relationship between the number of days above the epidemic threshold before the mobility flows reduce significantly as an effect of lockdowns, and the total number of confirmed SARS-CoV-2 infections per 100k inhabitants, thus indirectly showing the effectiveness of the lockdown and the other non-pharmaceutical interventions in the containment of the contagion. Our study demonstrates the value of"big"mobility data to the monitoring of key epidemic indicators to inform choices as the epidemics unfolds in the coming months. Understanding the relationship between human mobility patterns and the spreading of COVID-19 is crucial to the restart of social and economic activities, limited or put in "stand-by" during the national lockdown to contain the diffusion of the epidemics, and to monitor the risk of a resurgence during the current phase 2, or lockdown exit. Recent analyses document that, following the national lockdown of March 11th, the mobility fluxes in Italy have significantly decreased by 50% or more, everywhere in the country, as studied in our previous report [19] and [10, 21] . In this report we study the relation between human mobility and SARS-CoV-2 transmissibility before, during and after the national lockdown. We compare the flows of people between and within Italian regions with the net reproduction number R t , i.e., the mean number of secondary infections generated by one primary infector in the presence of control interventions and human behavioural adaptations. To pursue this goal, we use mobile phone data at national scale to reconstruct the self-, in-and out-flows of Italian regions before and during the national lockdown (initiated on March 11th, 2020), after the closure of non-essential productive and economic activities (March 23th, 2020), and after the partial restart of economic activities and within-region movements (the "phase 2", from May 4th, 2020). In this report, we address the following analytical questions: • How does the net reproduction number vary in relation to the variation of mobility flows? • What differences, if any, do we observe across the Italian regions? • Can we relate the delay in limiting human mobility with the rate of positive COVID-19 cases across the population? The answers to these questions are highlighted in the next sections. An interactive, dynamically updated version of this report is available at http://sobigdata.eu/covid_report/#/report2 Page 5 In this report, we rely on mobile phone data, which have proven to be a useful data source to track the time evolution of human mobility [5, 6, 11] , and thus a tool for monitoring the effectiveness of control measures such as movement restrictions and physical distancing [1, 2, 3] . Specifically, the raw data used in this report are the result of normal service operations performed by the mobile operator WINDTRE a : CDRs (Call Detail Records) and XDRs (eXtended Detail Records). In both cases, the fundamental geographical unit is the "phone cell" defined as the area covered by a single antenna, i.e., the device that captures mobile radio signals and keeps the user connected with the network. Multiple antennas are usually mounted on the same tower, each covering a different direction. The position of the tower (expressed as latitude and longitude) and the direction of the antenna allow inferring the extension of the corresponding phone cell. The position of caller and callee is approximated by the corresponding antenna serving the call, whose extension is relatively small in urban contexts (in the order of 100m x 100m) and much larger in rural areas (in the order of 1km x 1km or more). Based on this configuration, CDRs describe the location of mobile phone users during call activities and XDRs their location during data transmission for internet access. The information content provided by standard CDR and XDR is the following: In both CDRs and XDRs, the identity of the users is replaced by artificial identifiers. The correspondence between such identifiers and the real identities of the users is known only to the mobile phone operator, who might use it in case of necessity. This pseudonymization procedure is a first important step (mentioned in Article 6 (4) and Article 25 (1) of the GDPR, the EU General Data Protection Regulation) to provide anonymity [4, 7, 8] and it will then turn into totally anonymous data for the possible treatment data use. For the analyses in this report, we used aggregated data computed by the mobile operator covering the period January 13th, 2020 to May 17th, 2020. For each phone call, a tuple is recorded, where n o and n i are pseudo-anonymous identifiers, respectively of the "caller" and the "callee"; t is a timestamp saying when the call was placed; A s and A e are the identifiers of the towers/antennas to which the caller was connected at the start and end of the call; finally, d is the call duration (e.g., in minutes). They are similar to CDRs, except that the communication is only between the antenna and the connected mobile phone, and an amount k of kilobytes is downloaded in the process. The format of XDR is, therefore, a tuple . a WINDTRE is one of the main mobile phone operators in Italy, covering around 32% of the residential "human" mobile market. Page 6 Origin-Destination matrices -For a better matching with the available COVID-19 data (number of positive cases and net reproduction number), we aggregated the municipality-to-municipality origin-destination matrices (ODs) into province-to-province or region-to-region ODs, in which each node represents an Italian province or region. In particular, for each day, we compute both the out-flows, indicating the total number of people moving from a province/region to any other province/region, and the in-flows, indicating the total number of people moving to a province/ region from any other province/region. The trips between municipalities of the same province/region are aggregated into a self-flow, which indicates the province/region's internal mobility. For privacy reasons, we eliminate all out-, in-and self-flows with values lower than 15. As they are calculated by the operator, we store the daily municipality-to-municipality OD matrices and the daily region-to-region ones into a relational DBMS and access them through calls to a dedicated API. We normalize the self-, in-and out-flows by multiplying them by coefficients provided by the mobile phone operator, which indicate an estimation of market share for every municipality. After this transformation, we have an estimation of the real size of the mobility flow between each origin and destination municipality. For ease of readability, Figures 1, 2 and 3 visualize the OD matrix of flows between Italian regions on February 18th (before the initiation of the national lockdown on March 9th), March 24th (during the lockdown), and May 12th (during phase 2), respectively. We find that the OD matrix becomes significantly more sparse during the lockdown and the phase 2, denoting a drastic reduction of the routes between Italian regions. Numerically, we estimate this sparsity through the network density, i.e., the proportion of the potential connections in a network that are actual connections. We find that network density halves during the lockdown: it decreases from d 18 feb = 0.47 to d 24 mar = 0.23, indicating that the lockdown erases half of the possible connections between regions compared to the previous period. Network density remains almost unchanged between the lockdown and the phase 2 (d 12 may = 0.2), presumably because movements between regions are still forbidden by law except for specific circumstances (e.g. commuting for work). These results clearly highlight the drastic change in the structure of the human mobility network between regions in the two periods taken into consideration. Indeed, most of the regional out-flows go towards adjacent regions. For example, before the lockdown, most of Lombardy's out-flow is directed towards Veneto, Piedmont and Emilia-Romagna (adjacent regions), and the rest of the out-flows distribute more or less uniformly across all other regions, both in the North and the South (Figure 1 ). In contrast, during the lockdown, the number of these more modest out-flows decreases substantially, and most of them disappear at all ( Figure 2 ). The width of the arrows is proportional to the flow between the two regions. The density of the flows network is d 18 feb = 0.47. On the left, numbers in parenthesis indicate the out-flow. On the right, numbers in parenthesis indicate the in-flow. The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy BEFORE LOCKDOWN The width of the arrows is proportional to the flow between the two regions. The density of the flows network is d 24 mar = 0.23. On the right, numbers in parenthesis indicate the in-flow. The width of the arrows is proportional to the flow between the two regions. The density of the flows network is d 12 may = 0.2. On the right, numbers in parenthesis indicate the in-flow. The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Flows between Italian regions -Another important aspect of the mobility of a region or a province, complementary to the volumes of incoming and outgoing flows, is the diversification of the provenance and the destination of people. Specifically, we define the in-flow diversity of a province A as the Shannon entropy of the in-flows to the province [9] where P in is the number of provinces with non-null flow to province A, p(x) is the probability that the in-flow to province A comes from province x, and log(N) is a normalization factor where N=110 is the number of Italian provinces. The outflow diversity of province A is computed similarly as: where P out is the number of provinces with non-null flow from province A, and p(x) is the probability that the out-flow from province A goes to province x. Mobility diversity during the pre-lockdown and lockdown period has been studied in our first report [19] . Another important aspect of the mobility of a region or a province, complementary to the volumes of incoming and outgoing flows, is the diversification of the provenance and the destination of people. Specifically, we define the in-flow diversity of a province as the Shannon entropy of the in-flows to the A province [12] : where is the number of provinces with non-null flow to province , is the probability that the P in A (x) p in-flow to province comes from province , and is a normalization factor where is the A x og(N ) l 10 N = 1 number of Italian provinces. The out-flow diversity of province is computed similarly as: where is the number of provinces with non-null flow from province , and is the probability P out A (x) p that the out-flow from province goes to province . Mobility diversity during the pre-lockdown and A x lockdown period has been studied in our first report [22] . We compare the evolution of the out-, in-and self-flows with the evolution of the daily disease transmissibility in Italian regions, measured in terms of the net reproduction number . The net R t reproduction number represents the mean number of secondary infections generated by one primary infector, in the presence of control interventions and human behavioural adaptations. When decreases R t below the epidemic threshold of 1, the number of new infections begins to decline. The estimates of were R t computed from the daily time series of new cases by date of symptom onset. Case-based surveillance data used for estimating were collected by regional health authorities and collated by the Istituto Superiore di R t Sanità using a secure online platform, according to a progressively harmonized track-record. Data include, among other information, the place of residence, the date of symptom onset and the date of first hospital admission for laboratory-confirmed COVID-19 cases [15] . The distribution of the net reproduction number was estimated by applying a well-established statistical method [16] [17] [18] , which is based on the R t knowledge of the distribution of the generation time and on the time series of cases. In particular, the posterior distribution of for any time point was estimated by applying the Metropolis-Hastings MCMC R t t sampling to a likelihood function defined as follows: Another important aspect of the mobility of a region or a province, complementary to the volumes of incoming and outgoing flows, is the diversification of the provenance and the destination of people. Specifically, we define the in-flow diversity of a province as the Shannon entropy of the in-flows to the A province [12] : where is the number of provinces with non-null flow to province , is the probability that the P in A (x) p in-flow to province comes from province , and is a normalization factor where is the A x og(N ) l 10 N = 1 number of Italian provinces. The out-flow diversity of province is computed similarly as: A where is the number of provinces with non-null flow from province , and is the probability P out A (x) p that the out-flow from province goes to province . Mobility diversity during the pre-lockdown and A x lockdown period has been studied in our first report [22] . We compare the evolution of the out-, in-and self-flows with the evolution of the daily disease transmissibility in Italian regions, measured in terms of the net reproduction number . The net R t reproduction number represents the mean number of secondary infections generated by one primary infector, in the presence of control interventions and human behavioural adaptations. When decreases R t below the epidemic threshold of 1, the number of new infections begins to decline. The estimates of were R t computed from the daily time series of new cases by date of symptom onset. Case-based surveillance data used for estimating were collected by regional health authorities and collated by the Istituto Superiore di R t Sanità using a secure online platform, according to a progressively harmonized track-record. Data include, among other information, the place of residence, the date of symptom onset and the date of first hospital admission for laboratory-confirmed COVID-19 cases [15] . The distribution of the net reproduction number was estimated by applying a well-established statistical method [16] [17] [18] , which is based on the R t knowledge of the distribution of the generation time and on the time series of cases. In particular, the posterior distribution of for any time point was estimated by applying the Metropolis-Hastings MCMC R t t sampling to a likelihood function defined as follows: We compare the evolution of the out-, in-and self-flows with the evolution of the daily disease transmissibility in Italian regions, measured in terms of the net reproduction number R t . The net reproduction number represents the mean number of secondary infections generated by one primary infector, in the presence of control interventions and human behavioural adaptations. When R t decreases below the epidemic threshold of 1, the number of new infections begins to decline. The estimates of R t were computed from the daily time series of new cases by date of symptom onset. Case-based surveillance data used for estimating R t were collected by regional health authorities and collated by the Istituto Superiore di Sanità using a secure online platform, according to a progressively harmonized track-record. Data include, among other information, the place of residence, the date of symptom onset and the date of first hospital admission for laboratory-confirmed COVID-19 cases [12] . The distribution of the net reproduction number R t was estimated by applying a well-established statistical method [13] [14] [15] , which is based on the knowledge of the distribution of the generation time and on the time series of cases. In particular, the posterior distribution of R t for any time point t was estimated by applying the Metropolis-Hastings MCMC sampling to a likelihood function defined as follows: Where P(κ;λ) is the probability mass function of a Poisson distribution (i.e., the probability of observing κ events if these events occur with rate λ). C(t) is the daily number of new cases having symptom onset at time t; R t is the net reproduction number at time t to be estimated; φ(s) is the probability distribution density of the generation time evaluated at time s. As a proxy for the distribution of the generation time, we used the distribution of the serial interval, estimated from the analysis of contact tracing data in Lombardy [12] , i.e., a gamma function with shape 1.87 and rate 0.28, having a mean of 6.6 days. This estimate is within the range of other available estimates for SARS-CoV-2 infections, i.e. between 4 and 7.5 days [16] [17] [18] . We compare the evolution of the out-, in-and self-flows with the evolution of the daily disease transmissibility in Italian regions, measured in terms of the net reproduction number . The net R t reproduction number represents the mean number of secondary infections generated by one primary infector, in the presence of control interventions and human behavioural adaptations. When decreases R t below the epidemic threshold of 1, the number of new infections begins to decline. The estimates of were R t computed from the daily time series of new cases by date of symptom onset. Case-based surveillance data used for estimating were collected by regional health authorities and collated by the Istituto Superiore di R t Sanità using a secure online platform, according to a progressively harmonized track-record. Data include, among other information, the place of residence, the date of symptom onset and the date of first hospital admission for laboratory-confirmed COVID-19 cases [15] . The distribution of the net reproduction number was estimated by applying a well-established statistical method [16] [17] [18] , which is based on the R t knowledge of the distribution of the generation time and on the time series of cases. In particular, the posterior distribution of for any time point was estimated by applying the Metropolis-Hastings MCMC R t t sampling to a likelihood function defined as follows: is the probability mass function of a Poisson distribution (i.e., the probability of observing (κ; ) P λ κ events if these events occur with rate ). λ • is the daily number of new cases having symptom onset at time ; is the net reproduction number at time to be estimated; is the probability distribution density of the generation time evaluated at time . φ (s) s As a proxy for the distribution of the generation time, we used the distribution of the serial interval, estimated from the analysis of contact tracing data in Lombardy [15] , i.e., a gamma function with shape 1.87 and rate 0.28, having a mean of 6.6 days. This estimate is within the range of other available estimates for SARS-CoV-2 infections, i.e. between 4 and 7.5 days [19] [20] [21] . The number of COVID-19 positive cases is provided by Protezione Civile, the Italian public institution in charge of monitoring the COVID-19 emergency. They collect data from every Italian administrative region and make them available on a public github repository [23] . For each region, we focus on the number of new positive cases per day. Specifically, given a day , we compute the average of values over the four g days before and the four days after . g The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Page 10 Net Reproduction Number R t Epidemiologic Data - Figure 4 and 5 show the evolution of the mobility self-flows (blue curves), the net reproduction number (orange curves) and the number of positive cases (grey curves) for the northern regions and central-southern regions, respectively. These curves reveal numerous interesting insights. All regions have a net decrease of the self-flow soon after the first national lockdown (March 11th [22] ). The flows stabilized on the new, reduced volume after about one week. Subsequent restriction ordinances, such as the closing of non-essential economic activities on March 17th [23] , had a minor impact on the reduction of self-flows. For almost all regions, we find an increase of the self-flow since the start of phase 2 on May 4th. This behaviour is particularly pronounced for Emilia-Romagna, Toscana, Puglia, and Lazio. Further investigation is needed to understand why these regions had such a marked increase. Interestingly, we find a slight increase in the self-flows approaching May 4th, the starting of "phase 2" during which a wider range of movement within regions has been allowed by the government. We interpret this result as a progressive, although slight, relaxation of compliance with the mobility limitations imposed by the lockdown. The case of Molise is different and particularly compelling: it is indeed the only region for which the self-flow decreases since May 4th ( Figure 5 ). The reason for this decrease may be due to news media coverage about a funeral on April 30th, attended by a large number of people, which resulted in a large local outbreak. This may have induced parts of the Molise population to self-restrict movements during the following days 2 . The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Page 11 The number of COVID-19 positive cases is provided by Protezione Civile, the Italian public institution in charge of monitoring the COVID-19 emergency. They collect data from every Italian administrative region and make them available on a public github repository [20] . For each region, we focus on the number of new positive cases per day. Specifically, given a day g, we compute the average of values over the four days before and the four days after g. 2 https://www.ilfattoquotidiano.it/2020/05/13/coronavirus-nuovo-focolaio-in-molise-72-contagi-a-campobasso-legati-a-un-funerale-della-comunita-rom/5800192/ Although the date when R t =1 for the first time varies from region to region, for all regions R t decreases concurrently with the net decrease of self-flows due to the beginning of the national lockdown (Figures 4 and 5) ., highlighting the importance of the government intervention. Note that R t starts taking values lower than 1 since March 16th, when the self-flows stabilize on the new, reduced volume. From that moment on, self-flows remain stable. Still, the R t continues decreasing. This may be due to other ordinances by local and national governments related to the wearing of masks and gloves in public areas, social distancing and ban on gatherings --and possibly to other factors to be further investigated Evolution between January 13th and May 17th of self-flow (blue curve), net reproduction number R t (orange curve) and moving average of the number of confirmed SARS-CoV-2 infections (grey curve) in the northern regions of Italy. For each day, we plot the average of the R t values of the three days before and after that day. The orange-shaded area indicates R t >1. The value of the grey curve for a given day is computed as the average of the number of confirmed SARS-CoV-2 infections over the four days before and the four days after that day. The vertical dashed lines indicate the beginning of the national lockdown (LD, March 9th, 2020), the closing of non-essential economic activities (CNA, March 23th, 2020) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period before mobility reduction (MR) in that region. Note that the beginning of MR does not necessarily coincide with the national lockdown (e.g see Lombardia) . Evolution between January 13th and May 17th of self-flows (blue curve), net reproduction numbers (orange curve) and moving average of the number of confirmed SARS-CoV-2 infections (grey curve) in the central-southern regions of Italy. For each day, we plot the average of the R t values of the three days before and after that day. The orange-shaded area indicates R t >1. The value of the grey curve for a given day is computed as the average of the number of confirmed SARS-CoV-2 infections over the four days before and the four days after that day. The vertical dashed lines indicate the beginning of the national lockdown (LD, March 9th, 2020), the closing of non-essential economic activities (CNA, March 23th, 2020) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period before mobility reduction (MR) in that region. Note that the beginning of MR does not necessarily coincide with the national lockdown. Can we estimate the value of R t of Italian regions (or provinces) from the mobility of their population? This is a complex question, which we address here only preliminarily --focusing on in-and out-flow diversity, switching from the regional level used for the analysis to the level of provinces. A first cut regression model for estimating the daily R t as a function of mobility diversity is specified as follows: the beginning of the national lockdown (LD, March 11th, 2020), the closing of non-necessary economic activities (CNA, March 23th, 2020 ) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period in which there is a drastic decrease of both self-flows and net reproduction numbers. Can we estimate the value of of Italian regions (or provinces) from the mobility of their population? This is R t a complex question, which we address here only preliminarily --focusing on in-and out-flow diversity, switching from the regional level used for the analysis to the level of provinces. A first cut regression model for estimating the daily as a function of mobility diversity is specified as follows: indicates the fixed effect of province (to control for the non-observable heterogeneity between α i i the provinces), and are the daily in-and out-flow diversities and for indiversity it outdiversity it (i) E in (i) E out province on the same day , indicates the fixed effect of day , are stochastic errors residuals, and i t δ t t ε it (the outcome variable) is the net reproduction number estimated for day and province . We considered 89 of 107 Italian provinces for which a sufficient number of symptomatic cases had been recorded for a reliable computation of the estimate. As an outcome of regressing the daily on the in-and R t out-flow diversities of the same day, we find that contributes to reduce the and outdiversity it R t indiversity it to increase it, but these effects are not statistically significant. The picture changes substantially if we introduce time lags, e.g., through the model: in which the regressors cover the entire week before the measurement of . In Table 1 (column 3) , we R t consider the fixed effects of province only, while in column 4 we add the fixed effect for day. For , we j = 2 find that contributes to increase the and to reduce it, with statistical indiversity it−2 R t outdiversity it−2 significance at 10%. If we increase the lagging period past one week ( , we find a stronger statistical significance (Table 1 , ≥ 7) j columns 5 and 6). For out-flow diversity, the closer the day is to the date of the ( ) the stronger the R t j < 7 impact on contagion. Conversely, for in-flow diversity, the further away the day is to the date of the ( R t ≥ 7 j ) the stronger the impact on contagion. Figure 6 reports the temporal profile of the coefficients estimated in column 5 of where α i indicates the fixed effect of province i (to control for the non-observable heterogeneity between the provinces), indiversity it and outdiversity it are the daily in-and out-flow diversities E in (i) and E out (i) for province i on day t, δ t indicates the fixed effect of day t, ε it are stochastic errors residuals, and R it (the outcome variable) is the net reproduction number estimated for day t and province i. We considered 89 of 107 Italian provinces for which a sufficient number of symptomatic cases had been recorded for a reliable computation of the estimate. As an outcome of regressing the daily R t on the in-and out-flow diversities of the same day, we find that outdiversity it contributes to reduce the R t and indiversity it to increase it, but these effects are not statistically significant. The picture changes substantially if we introduce time lags, e.g., through the model: The picture changes substantially if we introduce lags, e.g. through the model: the beginning of the national lockdown (LD, March 11th, 2020), the closing of non-necessary economic activities (CNA, March 23th, 2020 ) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period in which there is a drastic decrease of both self-flows and net reproduction numbers. Can we estimate the value of of Italian regions (or provinces) from the mobility of their population? This is R t a complex question, which we address here only preliminarily --focusing on in-and out-flow diversity, switching from the regional level used for the analysis to the level of provinces. A first cut regression model for estimating the daily as a function of mobility diversity is specified as follows: indicates the fixed effect of province (to control for the non-observable heterogeneity between α i i the provinces), and are the daily in-and out-flow diversities and for indiversity it outdiversity it (i) E in (i) E out province on the same day , indicates the fixed effect of day , are stochastic errors residuals, and i t δ t t ε it (the outcome variable) is the net reproduction number estimated for day and province . We considered 89 of 107 Italian provinces for which a sufficient number of symptomatic cases had been recorded for a reliable computation of the estimate. As an outcome of regressing the daily on the in-and R t out-flow diversities of the same day, we find that contributes to reduce the and outdiversity it R t indiversity it to increase it, but these effects are not statistically significant. The picture changes substantially if we introduce time lags, e.g., through the model: in which the regressors cover the entire week before the measurement of . In Table 1 (column 3) , we R t consider the fixed effects of province only, while in column 4 we add the fixed effect for day. For , we j = 2 find that contributes to increase the and to reduce it, with statistical indiversity it−2 R t outdiversity it−2 significance at 10%. If we increase the lagging period past one week ( , we find a stronger statistical significance (Table 1 , ≥ 7) j columns 5 and 6). For out-flow diversity, the closer the day is to the date of the ( ) the stronger the R t j < 7 impact on contagion. Conversely, for in-flow diversity, the further away the day is to the date of the ( R t ≥ 7 j ) the stronger the impact on contagion. Figure 6 reports the temporal profile of the coefficients estimated in column 5 of in which the regressors cover the entire week before the measurement of R t . In Table 1 (column 3), we consider the fixed effects of province only, while in column 4 we add the fixed effect for day. For j=2, we find that indiversity it-2 contributes to increase the R it and outdiversity it-2 to reduce it, with statistical significance at 10%. If we increase the lagging period past one week (j ≥ 7), we find a stronger statistical significance (Table 1 , columns 5 and 6). For out-flow diversity, the closer the day is to the date of the R t (j<7) the stronger the impact on contagion. Conversely, for in-flow diversity, the further away the day is to the date of the R t (j ≥ 7) the stronger the impact on contagion. Figure 6 reports the temporal profile of the coefficients estimated in column 5 of Robust standard errors in brackets -*** p<0.01, ** p<0.05, * p<0.1 robust errors clustered by provincecountry fixed effects included. The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Influence of variables on disease transmissibility, measured as the net reproduction number R t With entropic measures we cannot control for movements within each province. If we replace mobility diversity with the in-flows, out-folws and self-flows of each province (number of individuals registered as either changing provinces or moving within the province), we find similar statistical significance for mobility in the previous week, confirming that outflow of people contributes to contagion reduction, while arrival of people from outside the province or internal mobility raises it. While a more detailed modeling of the association between contagion and mobility is certainly needed, and may lead to additional insights, these preliminary results --together with the analysis of self-flows presented in the previous section, provide clear evidence for a critical role of human mobility in the spatio-temporal unfolding of the epidemics. We go one step further in our analysis of the relationships between mobility flows and contagion by computing two quantities, for each region: 1) the delay in mobility reduction, i.e., the number of days in which R t > 1 before the mobility flows of a region decrease by at least 20% w.r.t. the usual (pre-epidemics) weekly mobility, observed over January and the first two weeks of February, and 2) the total number of reported SARS-CoV-2 infections per 100k inhabitants in the region (as of May 15th, 2020). The date of mobility reduction below 20% for each region is indicated as the MR black vertical line in the time series of Figures 4 and 5. Figure 7 shows a scatter plot of the two quantities for all regions, where the size of the circle of each region is proportional to the total number of reported infections in the observation period; the positive correlation between the two quantities is robust (Pearson coefficient = 0.46, p < 0.05, r 2 = 0.21), suggesting that larger delays could have induced heavier spreading of the virus. This is a strong evidence that timely lockdowns are instrumental for better containment of the contagion. Two further considerations follow. The mobility reduction in Lombardy started around 32 days after the first day in which R t > 1, leading to the highest number of positive cases per inhabitant in Italy. Similarly, for other regions severely affected by the virus, such as Liguria, Emilia-Romagna and Piedmont, the mobility reduction started around, respectively, 32 and 38 days after the first day in which R t > 1. The central-western regions in North Italy regions lie above the dashed regression line, in the top right part of the plot. The regions below this line, in the bottom right part of the plot, were more effective than the regions above the line in containing the contagion, despite the delay in lockdown of 30 days or more, such as Veneto, Lazio, and Tuscany. This fact may be explained by several factors, including the effectiveness of the epidemic surveillance, the intensity of the testing and tracing strategy adopted, the capacity of outbreak containment, and also the absolute number of cases when R t jumps above 1. On the other hand, for southern regions the mobility reduction started with around 10 days of delay (Molise, Basilicata) or around 20 days (Campania, Puglia, Sicily, Calabria), which presumably was effective in containing the spread of the virus: all southern regions (with the only exception of Molise) are below the regression line in the bottom left part of the plot (low number of infections per 100K inhabitants). Central-southern regions are the ones who benefited the most from the lockdown, presumably because it started more timely. This brings further evidence of the effectiveness of the lockdown. In the scatter plot, the horizontal axis has the number of days between the first time R t > 1and the beginning of the national lockdown. The vertical axis has the cumulative incidence of confirmed SARS-CoV-2 infections per 100k inhabitants (as of May 15th, 2020). The size of the circles is proportional to the total number of positive cases in the period (Pearson coefficient= 0.46, p<0.05, R 2 = 0.21). The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Page 18 Our combined analysis of mobility and epidemics highlighted a striking relation between the negative variation of movement fluxes and the negative variation of the net reproduction number, in all Italian regions, in the time interval of approximately one week, from March 9th till March 16th, during the transition between the two mobility modalities. During this week, the two curves gracefully overlap; at the end of this week, the country has reached a new "stable" mobility regime, approximately at 40% of the pre-lockdown level. The two curves exhibit the same pattern everywhere, both at regional and provincial level, with minimal temporal lags. We call this phenomenon, represented schematically in Figure 8 , the "epi-mob" pattern. Mobility, the blue curve, is a "switch" between two very different levels, before and during the lockdown, with an exponential fall from the first to the second; the epidemics is a peaked distribution with an exponential growth and fall, overlapping with the "switch" during the fall. The presumable effectiveness of the lockdown for the containment of the epidemics is further substantiated by the pattern in Figure 7 , relating the timeliness of lockdown in every region with the total number of infected individuals per 100K inhabitants. We can also quantify the time needed to "switch off" the country mobility (approx. 1 week to reach the new lower regime) and the time needed to bring the net reproduction number below 1 (again, approx. 1 week). Notice that R t continues to slowly decrease during lockdown, and also that at the beginning of phase 2 (lockdown exit), when mobility begins to rise again, R t does not jump into a new uncontrolled growth, at least until May 17th, the last day of observation in this report. This is probably due to the non-pharmaceutical interventions in place, including the increased compliance of people to use personal protective equipment and respect social distancing (compared to the pre-lockdown phase). Another factor limiting the increase of R t in the Phase 2 could also be related to an increased ability to trace, test and isolate infected individuals. We have also shown that a simple regression model provides reasonable estimates of the R t in a given region (or province) as a function of the in-and out-flows. Clearly, the accuracy of the estimation is influenced by the changes in containment interventions and in citizens' behaviour towards prevention measures. Consequently, learning a regression model for R t during pre-lockdown and early lockdown and applying the model for predicting the R t during phase 2 would probably lead to overestimating it; however, such worstcase scenario might be useful for comparison with the actual R t measured, as a mean to evaluate the effectiveness of the containment policy in act and citizens' social behavior. An interesting point that calls for further study is how to continuously learn an estimation model for the R t , whose accuracy is continuously monitored, to the purpose of nowcasting the R t , shortening as much as possible the lag of time to wait until the R t becomes known. As a conclusion, we believe that this study demonstrated the value of "big" mobility data, a detailed proxy of human behavior available every day in real time, to the purpose of refining our understanding of the dynamics of the epidemics, reasoning on the effectiveness of policy choices for non-pharmaceutical interventions and on citizens' compliance to social distancing measures, and help monitoring key epidemic indicators to inform choices as the epidemics unfolds in the coming months. During the first week of lockdown, the two curves describing mobility flows and net reproduction number gracefully overlap. At the end of this week, the country has reached a new "stable" mobility regime, approximately at 40% of the pre-lockdown level. The two curves exhibit the same pattern everywhere, both at regional and provincial level, with minimal temporal lags. The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Page 20 APPENDIX (mobility inflow) - Mar 2020 Apr 2020 May 2020 0 For each day, we plot the average of the R t values of the three days before and after that day. The orange-shaded area indicates R t >1. The value of the grey curve for a given day is computed as the average of the number of confirmed SARS-CoV-2 infections over the four days before and the four days after that day. The vertical dashed lines indicate the beginning of the national lockdown (LD, March 9th, 2020), the closing of non-essential economic activities (CNA, March 23th, 2020) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period before mobility reduction (MR) in that region. Note that the beginning of MR does not necessarily coincide with the national lockdown. Evolution between January 13th and May 17th of out-flow (blue curve), net reproduction number R t (orange curve) and moving average of the number of confirmed SARS-CoV-2 infections (grey curve) in the northern regions of Italy. For each day, we plot the average of the R t values of the three days before and after that day. The orange-shaded area indicates R t >1. The value of the grey curve for a given day is computed as the average of the number of confirmed SARS-CoV-2 infections over the four days before and the four days after that day. The vertical dashed lines indicate the beginning of the national lockdown (LD, March 9th, 2020), the closing of non-essential economic activities (CNA, March 23th, 2020) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period before mobility reduction (MR) in that region. Note that the beginning of MR does not necessarily coincide with the national lockdown (e.g. see Lombardia). Evolution between January 13th and May 17th of out-flows (blue curve), net reproduction numbers (orange curve) and moving average of the number of confirmed SARS-CoV-2 infections (grey curve) in the central-southern regions of Italy. For each day, we plot the average of the R t values of the three days before and after that day. The orange-shaded area indicates R t >1. The value of the grey curve for a given day is computed as the average of the number of confirmed SARS-CoV-2 infections over the four days before and the four days after that day. The vertical dashed lines indicate the beginning of the national lockdown (LD, March 9th, 2020), the closing of non-essential economic activities (CNA, March 23th, 2020) and the partial restarting of economic activities and within-region movements ("Ph 2", May 4th, 2020). The area in white indicates the period before mobility reduction (MR) in that region. Note that the beginning of MR does not necessarily coincide with the national lockdown. Mobile phone data and COVID-19: Missing an opportunity Aggregated mobility data could help fight COVID-19 Measuring Levels of Activity in a Changing City: A Study Using Cellphone Data Streams On the privacy-conscientious use of mobile phone data A survey of results on mobile phone datasets analysis Returners and explorers dichotomy in human mobility PRIMULE: Privacy risk mitigation for user profiles A Data Mining Approach to Assess Privacy Risk in Human Mobility Data An analytical framework to nowcast well-being using mobile phone data COVID-19 outbreak response: first assessment of mobility changes in Italy following lockdown So) Big Data and the transformation of the city Epidemiological characteristics of COVID-19 cases in Italy and estimates of the reproductive numbers one month into the epidemic WHO Ebola Response Team. Ebola virus disease in West Africa-the first The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy Page 21 9 months of the epidemic and forward projections A new framework and software to estimate time-varying reproduction numbers during epidemics Measurability of the epidemic reproduction number in data-driven contact networks Serial interval of novel coronavirus (COVID-19) infections Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Mobile phone data analytics against the COVID-19 epidemics in Italy: flow diversity and local job markets during the national lockdown Human Mobility in Response to COVID-19 in France Ulteriori disposizioni attuative del decreto-legge 23 febbraio 2020, n. 6, recante misure urgenti in materia di contenimento e gestione dell'emergenza epidemiologica da COVID-19, applicabili sull'intero territorio nazionale Misure di potenziamento del Servizio sanitario nazionale e di sostegno economico per famiglie, lavoratori e imprese connesse all'emergenza epidemiologica da COVID-19