key: cord-0309052-yn3l7t91 authors: Perrotta, D.; Frias-Martinez, E.; Pastore y Piontti, A.; Zhang, Q.; Luengo-Oroz, M.; Paolotti, D.; Tizzoni, M.; Vespignani, A. title: Comparing Sources of Mobility for Modelling the Epidemic Spread of Zika Virus in Colombia date: 2021-08-10 journal: nan DOI: 10.1101/2021.08.09.21261630 sha: c8a5eacfa0c105311a985f1690d52164f480b515 doc_id: 309052 cord_uid: yn3l7t91 Timely, accurate, and comparative data on human mobility is of paramount importance for epidemic preparedness and response, but generally not available or easily accessible. Mobile phone metadata, typically in the form of Call Detail Records (CDRs), represents a powerful source of information on human movements at an unprecedented scale. In this work, we investigate the potential benefits of harnessing aggregated CDR-derived mobility to predict the 2015-2016 Zika virus (ZIKV) outbreak in Colombia, when compared to other traditional data sources. To simulate the spread of ZIKV at sub-national level in Colombia, we employ a stochastic metapopulation epidemic model for vector-borne disease. Our model integrates detailed data on the key drivers of ZIKV spread, including the spatial heterogeneity of the mosquito abundance, and the exposure of the population to the virus due to environmental and socio-economic factors. Given the same modelling settings (i.e. initial conditions and epidemiological parameters), we perform in-silico simulations for each mobility network and assess their ability in reproducing the local outbreak as reported by the official surveillance data. We assess the performance of our epidemic modelling approach in capturing the ZIKV outbreak both nationally and sub-nationally. Our model estimates are strongly correlated with the surveillance data at the country level (Pearson's r=0.92 for the CDR-informed network). Moreover, we found strong performance of the model estimates generated by the CDR-informed mobility network in reproducing the local outbreak observed at the sub-national level. Compared to the CDR-informed network, the performance of the other mobility networks is either comparatively similar or substantially lower, with no added value in predicting the local epidemic. This suggests that mobile phone data capture a better picture of human mobility patterns. This work contributes to the ongoing discussion on the value of aggregated mobility estimates from CDRs data that, with appropriate data protection and privacy safeguards, can be used for social impact applications and humanitarian action. In 2015-2016, a large-scale outbreak of Zika virus (ZIKV) infection affected the Americas and 41 the Pacific. The epidemic was first confirmed in Brazil in May 2015 and rapidly reached a 42 total of 50 countries and territories through the end of 2016 [1] . ZIKV infection is typically 43 accompanied by mild illness, but following the increased incidence of neurological complications, 44 including microcephaly in newborns and Guillain-Barrè syndrome, the WHO declared a Public 45 Health Emergency of International Concern (PHEIC) [2] in February 2016, which lasted for 46 nearly 10 months. 47 First isolated in the Zika forest of Uganda in 1947, ZIKV is primarily transmitted by infected 48 Aedes mosquitoes [3, 4] , also responsible for transmitting other infectious diseases, including 49 dengue, chikungunya, and yellow fever. Other ways of transmission have been reported, such as 50 sexual and perinatal transmission [5, 6, 7, 8] and blood transmission through blood transfusion 51 [9] . The likelihood of sustained local transmission of ZIKV is therefore fuelled by the presence of 52 Aedes mosquitoes, whose spatial heterogeneity and seasonal variability are in turn regulated by 53 the local environment and climate [10] . Since mosquitoes cannot fly too far, but tend to spend 54 their lifetime around where they emerge, human population movement is likely responsible for 55 ZIKV introduction to new regions with favourable local conditions for mosquitoes proliferation 56 and sustained disease transmission [11] . 57 Human mobility is in fact a key driver of ZIKV spread as well as of several infectious diseases, 58 increasing the disease prevalence by introducing new pathogens into susceptible populations, 59 or by increasing social contacts between susceptible and infected individuals [12] . Timely, 60 accurate, and comparative data on human mobility is therefore of paramount importance for 61 epidemic preparedness and response, but generally not available or easily accessible. Traditional 62 data, typically collected from censuses, is often inadequate due to lack of spatial and temporal 63 resolution, or may be completely unavailable in developing countries. Mathematical models, 64 such as the gravity model of migration or the radiation model, represent an alternative to 65 overcome scarcity of traditional data by synthetically quantifying mobility patterns at different 66 scale. However, more detailed data on mixing patterns is generally needed to capture the 67 spatio-temporal fluctuations in disease incidence [13, 14] . 68 The recent availability of large amounts of geolocated datasets have revolutionized research 69 in this field, enabling to quantitatively study individual and collective mobility patterns as 70 generated by human activities in their daily life [15] . In this context, mobile phone metadata, 71 typically in the form of Call Detail Records (CDRs), represents a powerful source of informa-72 tion on human movements. Created by telecom operators for billing purposes and summarising 73 mobile subscribers' activity (e.g. phone calls, text messages and data connections), CDRs rep-74 resents a relatively low-cost resource to draw a high-level picture of human mobility patterns at 75 an unprecedented scale [12] . The availability of aggregated CDR-derived mobility has impacted 76 several research fields [16] , with significant applications to the spatial modelling of many infec-77 tious diseases, such as malaria [17, 18] , dengue [19] , cholera [20] , rubella [21] , Ebola [22, 23] , 78 and COVID-19 [24, 25, 26, 27, 28] . 79 In this study, we investigate the potential benefits of harnessing CDRs data to predict the 80 spatio-temporal spread of Zika virus in Colombia, at sub-national level, during the 2015-2016 81 outbreak in the Americas [29] . We assess the potential improvement in predictive power of 82 integrating aggregated cell phone-derived population movements into a spatially structured 83 epidemic model, when compared to more traditional methods (e.g. census data and mobility 84 models). For this, we examine different sources of human mobility, including i) CDRs data, 85 derived from more than two billion encrypted and anonymized calls made by around seven 86 therefore not sufficient to capture the initial spread of infections in Colombia. Leveraging on 119 such global approach allows us to inform our epidemic model with the travel-associated ZIKV 120 infections entering Colombia, and potentially triggering local ZIKV transmission, to ultimately 121 assess the impact of internal mobility patterns in predicting the spatio-temporal dynamics of 122 ZIKV transmission in Colombia. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Colombia are reported in Table S1 in the Supplementary Material. Figure 1A shows the cu- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. Figure 1B shows the distribution of population estimates by department. The data is in the form of an OD matrix of daily population movements between municipalities. 182 We aggregate flows spatially into departments and rescale them to reflect the 2015 population 183 estimates. In the following, we will refer to the census network as w C ij . Note that although this 184 dataset is not recent and comprises only the commuting patterns, we will use it as a reference 185 when comparing the various mobility networks. We create synthetic mobility networks using two mathematical mobility models, namely the 188 gravity model [32] and the radiation model [33] . The gravity model assumes that the flows w ij of individuals travelling from location i with 190 population N i to location j with population N j placed at distance d ij take the following form 191 [32] : where C is a proportionality constant, α and γ tune the dependence with respect to each 193 location size, and f (d ij ) is a distance-dependent function. By applying a multivariate linear is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; Fig. 2 . Epidemic modelling framework. (A) The disease dynamics occurs according to a compartmental classification for ZIKV infection. Humans follows a susceptible-exposedinfectious-removed (SEIR) H classification, whereas mosquitoes follow a susceptible-exposedinfectious (SEI) V . The transmission dynamics of ZIKV occurs through the interaction between susceptible humans S H and infected mosquitoes I V , and between infected humans I H and susceptible mosquitoes S V . (B) Summary of epidemiological parameters: T dep. denotes parameters that are temperature-dependent. T, G dep. denotes parameters that are temperature-and geolocation-dependent. Specific values for the parameters can be found in Refs. [35, 41, 42, 43] regression analysis in the logarithmic scale, we estimate the free parameters in Eq. (1) that 195 best fit the census data (see Table S2 in the Supplementary Material). In the radiation model, instead, the flows w ij take the following form [33] : where N i is the population living at origin i, N j is the population living at destination j, s ij 198 is the total population living in a circle of radius d ij centred at i, excluding the populations 199 of origin and destination locations, and T i is the total outflow from i (i.e. j =i w ij ). The 200 radiation model is parameter-free (i.e. it does not require regression analysis or fit on existing 201 data), it only requires the estimate of the total number of travellers T i from the census data. Given these quantities, we apply the gravity law of Eq. (1) and the radiation law of Eq. (2) 203 on a fully connected synthetic network, whose nodes correspond to the Colombian departments, 204 thus yielding the flows w G ij and w R ij , respectively. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.09.21261630 doi: medRxiv preprint links based on each mobility network considered in this study. The migration process among 216 subpopulations is modelled with a Markovian dynamics, representing individuals who are in-217 distinguishable regarding their travel pattern, so that at each time step the same travelling 218 probability applies to all individuals without having memory of their origin [44] . No other type 219 of movement is considered. The infection dynamics occurs in homogeneous mixing approxima-220 tion within each subpopulation according to a compartmental classification of the individuals 221 based on the various stages of the disease. Specifically, humans are classified according to 222 a susceptible-exposed-infectious-removed (SEIR) H compartmental model, whereas mosquitoes 223 follow a susceptible-exposed-infectious (SEI) V compartmental model. factor r se modulating its exposure to the vector based on local socio-economic conditions. Figure 245 1C shows the fraction of the population exposed to ZIKV due to environmental and socio- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; Table 1 . Basic properties of the mobility networks. The table reports the total number of nodes and links, the number of links shared with the census network, and the total volume of travellers of each mobility network under study. Self-loops are excluded. pattern. In general, higher rates of mobility mainly concern the northern and western part 289 of the country, where most of the urban centres are located, whereas lower rates of mobility 290 concern instead the southern and eastern parts, which are mostly sparsely inhabited (see maps 291 in Figure S5 in the Supplementary Material). Restricting the analysis to the topological intersection of the mobility networks and the 293 census network, we analyse the structural and flows properties of the networks. Table 2 reports We excluded from this analysis those departments with less than 100 total ZIKV cases re-342 ported by the official surveillance data, which correspond to the departments of Nario, Vichada, 343 Choco, Vaupes, and Guainia (cumulative cases are reported in Table S1 of the Supplementary is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; Fig. 5 . Comparison between the estimated and observed ZIKV incidence. Weekly number of new ZIKV infections (per 100,000 population) as estimated from the stochastic ensemble output in the setting using the CDR-informed network (blue), the census network (black), the gravity network (orange), and the radiation network (purple). The bold line and shaded area refer to the median number of infections and 95% CI of the model estimates. Black dots correspond to the official ZIKV incidence (per 100,000 population) reported by Colombia's National Institute of Health (right y-axis). For ease of comparison, surveillance data is scaled on the peak of the model estimates of the CDR-informed network. The inset graph shows the peak week as calculated from the model estimates. The observed epidemic peak was at week 2016-05 (green line). and detection rate ranging between 0.51% ± 0.23% for the gravity network to 0.72% ± 0.32% 356 for the CDR-informed network (all p < 0.05). To quantify the simulation's performance in capturing the epidemic timing observed in each 358 Colombian department, we calculate the Pearson's r correlation between the model estimates 359 generated by each mobility network and the observed surveillance time series, as shown in Figure 360 6A. Namely we investigate the correlation between the model estimated weekly incidence and is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; Fig. 7 . Correlation by main properties of mobility networks. The plots show the Pearson's r correlation (y-axis) by the total outflows i w ij of the CDR-informed network (A), census network (B), gravity network (C), and radiation network (D). Point size corresponds to population size. Colour code corresponds to node degree. Note that the scale of the colorbar changes across subplots in order to highlight the variability across networks. Human mobility is in fact a key driver of ZIKV spread and integrating this variable into spa-384 tial models can provide valuable insights for epidemic preparedness and response [11] . Timely, 385 accurate, and comparative data on human mobility is therefore of paramount importance. For is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; either comparatively similar or substantially lower, with no added value in predicting the local 416 epidemic. Specifically, we found that correlations are smaller for the CDR-informed network 417 in those departments with smaller node degree, lower traffic, and smaller population size. This 418 is the case of the departments of Putumayo, Amazonas, and San Andres. This latter is an 419 archipelago approximately 750 km north of the Colombian mainland, thus having fewer con- CDR-derived mobility, is prominent to forecast an emerging infectious disease like Zika [46] . Our modelling approach also contains assumptions and approximations as discussed in 453 Zhang et al. [35] . The transmission model has been calibrated by using data from the French is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.09.21261630 doi: medRxiv preprint increasing due to climate change. The response to many vector-borne diseases could benefit 464 from the proposed modelling approach which should be part of epidemic response toolkits of 465 public health authorities. Furthermore, in the ongoing COVID-19 pandemic, we believe this 466 work is relevant not only because of the proposed methodologies, but also as it contributes to 467 the ongoing discussion on the value of aggregated mobility estimates from CDRs data that, with 468 proper data protection and data privacy mechanisms, can be used for social impact applications 469 and humanitarian action [28] . . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.09.21261630 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.09.21261630 doi: medRxiv preprint Zika virus, Key Facts. Available at World Health Organization. WHO Director-General summarizes the outcome of the Emergency Committee regarding clusters of microcephaly and Guillain-Barré syndrome Differential Susceptibilities of Aedes aegypti and Aedes albopictus from the Americas to Zika Virus Zika virus in Gabon (Central Africa) -2007: a new threat from Aedes albopictus? PLoS neglected tropical diseases Evidence of perinatal transmission of Zika virus Evidence of sexual transmission of Zika virus Zika virus associated with microcephaly Low risk of a sexually-transmitted Zika virus outbreak. The Lancet infectious diseases Potential for Zika virus transmission through blood transfusion demonstrated during an outbreak in French Polynesia The global distribution of the arbovirus vectors Aedes aegypti and Ae A review of models applied to the geographic spread of Zika virus Connecting mobility to infectious diseases: the promise and limits of mobile phone data Comparing large-scale computational approaches to epidemic modeling: agent-based versus structured metapopulation models On the use of human mobility proxies for modeling epidemics Human mobility: Models and applications A survey of results on mobile phone datasets analysis Quantifying the impact of human mobility on malaria Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning Impact of human mobility on the emergence of dengue epidemics in Pakistan Using mobile phone data to predict the spatial spread of cholera Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data Commentary: containing the Ebola outbreak-the potential and challenge of mobile network data Population mobility reductions associated with travel restrictions during the Ebola epidemic in Sierra Leone: use of mobile phone data Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. The Lancet Digital Health Estimating the effect of social inequalities on the mitigation of COVID-19 across communities in Santiago de Chile Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology Aggregated mobility data could help fight COVID-19 Pan American Health Organization. Zika cumulative cases Evidence that calls-based and mobility networks are isomorphic Multiscale mobility networks and the spatial spreading of infectious diseases A universal model for mobility and migration patterns Modeling the spatial spread of infectious diseases: The GLobal Epidemic and Mobility computational model Spread of Zika virus in the Americas Genomic epidemiology supports multiple introductions and cryptic transmission of Zika virus in Colombia Establishment and cryptic transmission of Zika virus in Brazil and the Americas Weekly epidemiological reports from the Colombian National Institute of Health (INS) Zika virus outbreak on Yap Island, federated states of Micronesia Zika virus seroprevalence Countering the zika epidemic in latin america Nowcasting the spread of chikungunya virus in the Americas Impact of daily temperature fluctuations on dengue virus transmission by Aedes aegypti Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations Quantifying the risk of local Zika virus transmission in the contiguous US during the 2015-2016 ZIKV epidemic Trade-offs between individual and ensemble forecasts of an emerging infectious disease. medRxiv Mapping global environmental suitability for Zika virus The global compendium of Aedes aegypti and Ae. albopictus occurrence. Scientific Data Data curation: DPe. Formal analysis: DPe. Investigation: DPe, MT, AV. Methodology: DPe, MT, AV. Resources: EFM, APyP, QZ. Software: DPe. Supervision: MLO, DPa, MT, AV. Validation: DPe, MT. Visualization: DPe. Writing -original draft: DPe. Writing -review & editing: DPe The mobile phone data used in this study is proprietary and subject to strict privacy regulations. Access was granted after signing a non-disclosure agreement (NDA) with the proprietor, who anonymized and aggregated the original data before giving access to the authors. The mobile phone data could be available on request after a NDA is signed and discussed. The authors have declared that no competing interests exist. No IRB approvals were necessary.