key: cord-0053894-n4p3zumy authors: Sun, Haoyang; Dickens, Borame L.; Jit, Mark; Cook, Alex R.; Carrasco, L. Roman title: Mapping the cryptic spread of the 2015–2016 global Zika virus epidemic date: 2020-12-17 journal: BMC Med DOI: 10.1186/s12916-020-01845-x sha: 5bec832775fef1bc2022bed33c4d56e27a9712f1 doc_id: 53894 cord_uid: n4p3zumy BACKGROUND: Zika virus (ZIKV) emerged as a global epidemic in 2015–2016 from Latin America with its true geographical extent remaining unclear due to widely presumed underreporting. The identification of locations with potential and unknown spread of ZIKV is a key yet understudied component for outbreak preparedness. Here, we aim to identify locations at a high risk of cryptic ZIKV spread during 2015–2016 to further the understanding of the global ZIKV epidemiology, which is critical for the mitigation of the risk of future epidemics. METHODS: We developed an importation simulation model to estimate the weekly number of ZIKV infections imported in each susceptible spatial unit (i.e. location that did not report any autochthonous Zika cases during 2015–2016), integrating epidemiological, demographic, and travel data as model inputs. Thereafter, a global risk model was applied to estimate the weekly ZIKV transmissibility during 2015–2016 for each location. Finally, we assessed the risk of onward ZIKV spread following importation in each susceptible spatial unit to identify locations with a high potential for cryptic ZIKV spread during 2015–2016. RESULTS: We have found 24 susceptible spatial units that were likely to have experienced cryptic ZIKV spread during 2015–2016, of which 10 continue to have a high risk estimate within a highly conservative scenario, namely, Luanda in Angola, Banten in Indonesia, Maharashtra in India, Lagos in Nigeria, Taiwan and Guangdong in China, Dakar in Senegal, Maputo in Mozambique, Kinshasa in Congo DRC, and Pool in Congo. Notably, among the 24 susceptible spatial units identified, some have reported their first ZIKV outbreaks since 2017, thus adding to the credibility of our results (derived using 2015–2016 data only). CONCLUSION: Our study has provided valuable insights into the potentially high-risk locations for cryptic ZIKV circulation during the 2015–2016 pandemic and has also laid a foundation for future studies that attempt to further narrow this key knowledge gap. Our modelling framework can be adapted to identify areas with likely unknown spread of other emerging vector-borne diseases, which has important implications for public health readiness especially in resource-limited settings. Zika virus (ZIKV) is a flavivirus that is mainly transmitted by the Aedes mosquitoes [1] . Since its first isolation in a Ugandan forest in 1947 [2] , the virus has until recent years only caused regional outbreaks [3, 4] . In February 2016, a Public Health Emergency of International Concern was declared by the World Health Organization, as ZIKV swept through the majority of the countries in Latin America and the Caribbean, coinciding with an unusual increase in the number of microcephaly cases and other neurological disorders [5] . The reported ZIKV incidence declined substantially after 2016 [6] , but in the meantime, new scientific evidence continued to emerge and reveal new locations where ZIKV circulation had never been identified before [7] . More recently, the first autochthonous ZIKV case in Europe was reported in October 2019 with its source remaining unknown at the time of writing [8] , which underscores our limited understanding of the virus's epidemiology. Although ZIKV is no longer a public health emergency, the potential reoccurrence of future large-scale epidemics remains a concern, which necessitates continued investments in ZIKV research and surveillance in preparation for such an event [9] . To date, modelling studies have yielded important insights into the virus's transmission dynamics [10] [11] [12] [13] [14] , ecological niche [15] [16] [17] , and at-risk population size [18] , but little has been done to understand the gap between where cases may have already occurred and where they have been reported [19] . Due to the high proportion of ZIKV infections that are asymptomatic and its similarity in clinical presentation to other diseases such as dengue fever [3] , undetected or unreported spread of ZIKV was widely presumed, especially in countries with limited public health resources [17] . The identification of these locations is critical, as their healthcare systems may be overwhelmed during possible future waves of epidemics if ill prepared, and due to the risk of onward dissemination via air travel. Whilst a recent study used travel surveillance and viral genomes data to detect an unreported ZIKV outbreak in Cuba [20] , our understanding of the geographical areas that are likely to have experienced cryptic ZIKV spread is still seriously lacking at a global scale. To narrow this knowledge gap requires a novel modelling framework integrating a wide range of factors that determine the worldwide introduction of the virus and the subsequent autochthonous transmission. In this study, we aim to answer the following key questions, focusing on all countries or first-level subdivisions where no indigenous Zika cases were reported during the 2015-2016 global epidemic: (i) At least how many ZIKV infections were imported in each country or subdivision during 2015-2016 and to what extent were imported ZIKV infections underreported? (ii) Which countries or subdivisions were most likely to experience cryptic ZIKV spread during the 2015-2016 global epidemic based on currently available data? The weekly number of reported autochthonous Zika cases during 2015-2016 for each country in the Americas was published as bar charts by the Pan American Health Organization (PAHO) [21] . These data were digitised by the Andersen Lab for a genomic epidemiological study [22] and made publicly available [23] . We used the Web Plot Digitizer to extract the weekly number of notified Zika cases in Brazil during 2015 published by Lourenço et al. [24] , since these data were not published by the PAHO. To achieve a higher spatial resolution, we also obtained weekly cumulative case counts for each first-level subdivision of Colombia between Eweek 38 and Eweek 52 of 2016 from the Colombian National Institute of Health's website [25] . Each time series was then differenced to derive the weekly case counts and combined with data compiled by Siraj et al. [26] , which included weekly notified case data prior to Eweek 38 of 2016. For weekly autochthonous Zika cases reported by countries outside the Americas, we manually reviewed information compiled by HealthMap alongside additional data sources such as ReliefWeb [27, 28] (refer to the Additional File 1: supporting information [29] [30] [31] [32] [33] for more details). Each reported case was located to a first-level country subdivision whenever possible, and both confirmed and suspected autochthonous cases were included in our study (Additional File 2: Data S1). In addition to autochthonous case data, we obtained the reported total number of imported Zika cases during 2015-2016 for each US state, which was published by the Centres for Disease Control and Prevention (collection of imported case data for Florida and Texas was not needed for the study, as discussed in later sections) [34] . For all the other countries or subdivisions that did not report any autochthonous cases during 2015-2016, the imported case data were collected using HealthMap [27] . We requested the yearly average length of stay (LOS) on inbound tourism trips up to 2016 from the United Nations World Tourism Organization (UNWTO) [35] . For each country, we took the most recently available estimate for the analyses. In most cases, estimates were derived based on the check-in dates from arrival and departure cards or border survey data, but if such data were unavailable for a country, the estimate based on the Hotel Occupancy Survey was used instead [35] . In the rare event that the data were completely missing for a given country, we took the conservative approach and assumed the average LOS to be equal to the minimum value among all the countries under study, to avoid overestimating the number of imported ZIKV infections (refer to section "Simulation of the number of imported infections" for more details). The monthly number of air ticket bookings during 2015-2016 was obtained from the Official Airline Guide (OAG) for every origin-destination route with up to two connections. This was used to derive the weekly number of bookings assuming the daily number of bookings was uniform within each month for each route. For each firstlevel country subdivision with no incoming air passengers during 2015-2016, we performed an estimation of the most likely airport (within the same country) that the population would rely on when they returned home (details shown in the Additional File 1: supporting information [36] ). The subdivision was subsequently merged with the one where the identified airport was located, and they were modelled as a single unit from then on. For each country where autochthonous cases could not be located to the subdivision level, the entire country was treated as a single unit of analysis. Hence, the spatial unit of analysis in our study can be a single first-level country subdivision, a combination of subdivisions, or an entire country (hereinafter referred to as "spatial unit"). Countries with zero incoming air passenger during 2015-2016 were excluded from the analysis (refer to the Additional File 1: supporting information for more details). We obtained the global estimated 2015-population counts at a~1 km × 1 km resolution from the Socioeconomic Data and Applications Centre (SEDAC) [37] , which was used to derive the total population within each spatial unit. The daily temperature at 2 m between 2015 and 2016 was obtained from the European Centre for Medium-Range Weather Forecasts at~30 km × 30 km resolution [38] . Each temperature map was resampled using bilinear interpolation based on the cell size of the SEDAC population map. Subsequently, the resampled temperature pixel values were averaged within each spatial unit for each day, using the corresponding 2015-population counts as weights. We did not calculate raw average values because the population distribution can be highly uneven for some spatial units such as the Sichuan province of China, where the majority reside within the basin instead of the mountain or plateau regions. The methods employed to quantify the global environmental suitability of Ae. aegypti and Ae. albopictus, as well as the associated uncertainties, were described in Dickens et al. [39] . For each species, we obtained 250 vector suitability maps at a~5 km × 5 km resolution via bootstrapping. Our models performed reasonably well based on the outof-sample prediction accuracy, with a median true skill statistic of 0.84 (0.76-0.86) for Ae. aegypti and 0.71 (0. 66-0.78) for Ae. albopictus [39] . Following the same procedure as above, each vector suitability map was resampled according to the resolution of the SEDAC population map to derive the average suitability value weighted by the human population counts for each spatial unit. To map the cryptic spread of ZIKV during the 2015-2016 global epidemic, we first performed simulations to estimate the weekly number of ZIKV infections imported into each spatial unit that did not report any autochthonous cases during 2015-2016. Hereinafter, we will refer to these spatial units as susceptible spatial units, and the rest as donor spatial units. As a by-product of the simulation, a reporting index was derived to quantify the probability of reporting a case per importation for each susceptible spatial unit. Next, we estimated the virus's weekly basic reproduction number (R 0 ) for each susceptible spatial unit, which was then used to compute the probability that no onward spread following ZIKV importation occurred during 2015-2016. These probability estimates derived from our model, together with the local evidence of Aedes-borne disease transmission potential based on the existing literature, were used to identify susceptible spatial units with a high chance of cryptic ZIKV spread during 2015-2016 (refer to Fig. 1 for the schematic overview of methods, and Additional File 3 for the R code). To simulate the weekly number of imported ZIKV infections for each susceptible spatial unit, we first made the following definitions and assumptions. First, the local population of a spatial unit was defined as all who live in that spatial unit regardless of citizenship status. At any point in time, the size of a spatial unit i's local population who were visiting other spatial units was assumed to be approximately equal to the total number of spatial unit i's visitors. Hence, the total number of individuals located in a spatial unit at any point in time can be approximated by the total local population size, which was given by the SEDAC population estimates. Second, we assumed that air travellers with origin airport located in spatial unit i and destination airport located in spatial unit j were only made up of people belonging to the local population of spatial unit i or those of spatial unit j. Third, people who acquired ZIKV infection while visiting a spatial unit were assumed to remain asymptomatic by the end of their visits, so that all the autochthonous cases reported by a spatial unit can be assumed to come from its local population. Fourth, for any individual who belonged to the local population of a susceptible spatial unit and visited a donor spatial unit during 2015-2016, no immunity against ZIKV developed prior to his/her visit. A total of 10,000 simulations were performed, where in each simulation s = 1, …, 10, 000, we generated an estimate of the number of ZIKV infections imported from a donor spatial unit i to a susceptible spatial unit j during Eweek t, denoted by M ðsÞ i→ j;t . Of these, M ðsÞ i→ j;t ½i belonged to the local population of spatial unit i and M ðsÞ i→ j;t ½ j the local population of spatial unit j, which were estimated separately as follows. To begin with, we divided the autochthonous case count reported in spatial unit i during Eweek t, by the country-specific reporting rate based on O'Reilly et al.'s study to obtain the actual number of autochthonous infections I i, t [14] . Here, we used c i to denote the country to which spatial unit i belonged, and ρ c i to denote the percentage of autochthonous infections in c i that were reported, which was assumed to be time-invariant. In each simulation s, the reporting rate estimate ρ ðsÞ c i was drawn from a beta distribution with parameters a c i and Fig. 1 Schematic overview of the methods. Blue boxes denote input data, and dark orange boxes output estimates. Note that in the onward spread analysis, we further imposed thermal restrictions for ZIKV transmission and applied a threshold value for the estimated environmental suitability of Ae. aegypti to minimise false positives. Refer to the "Methods" section "Susceptible spatial units with cryptic spread of ZIKV" for more details b c i , under which the 2.5th and 97.5th percentiles of the resulting distribution equal the endpoints of the 95% credible interval produced by O'Reilly et al. [14] . For countries where the reporting rate estimate was unavailable, we assumed the parameters for the reporting rate uncertainty distribution to be equal to those for French Guiana, which had the highest reporting rate estimate among all the countries included in O'Reilly et al.'s study [14] . We considered this as a conservative approach, as it sought to avoid overestimating the weekly number of imported infections M i → j, t . Overall, countries not included in O'Reilly et al.'s study only accounted for less than 3% of the total number of autochthonous cases reported worldwide during 2015-2016. In the equations below, U i, t refers to the autochthonous case count reported by spatial unit i in Eweek t, and the same realisation ρ ðsÞ c i applied to all spatial units within country c i and for all Eweeks within each simulation s: Next, of the v i → j, t air travellers with origin airport located in spatial unit i and destination airport located in spatial unit j during Eweek t, v i → j, t [i] belonged to the local population of spatial unit i. In each simulation s, v ðsÞ i→ j;t ½i was drawn from a binomial distribution, with number of Bernoulli trials v i → j, t , and success probability π c i ;c j , where c i and c j referred to the countries to which spatial units i and j belonged respectively. Whilst for each country the UNWTO provided information on the arrivals of non-resident tourists/visitors at the national borders by country of residence, these data were incomplete and only allowed us to estimate the success probability π c i ;c j for less than 1.6% of the country pairs in our study. In addition, we did not find country-level indicators such as gross domestic product per capita to be informative for predicting π c i ;c j based on the very few estimates constructed from the UNWTO data. Hence, in each simulation s, π ðsÞ c i ;c j was drawn from a beta distribution with a substantial amount of variation around 0.5 to capture parameter uncertainty, followed by v ðsÞ i→ j;t ½i that was drawn from a binomial distribution as previously described. We then subtracted v ðsÞ i→ j;t ½i from v i → j, t to obtain v ðsÞ i→ j;t ½ j: It should be noted that within each simulation s, the realisation π ðsÞ c i ;c j also applied to all other routes with the same origin and destination countries. In other words, given a pair of origin and destination countries, the aforementioned probability value was generated only once in each simulation. Therefore, in simulation s, the first component of M ðsÞ i→ j;t was given by: The justification for the above calculation was as follows. In the denominator, POP i denoted the total number of people located in spatial unit i at Eweek t (or any point in time, as previously discussed). Here, we assumed a stable system in which the weekly arrival rate of visitors from the local population of any spatial unit k ≠ i was equal to the rate at which they exited spatial unit i, v i → k, t [k], for any Eweek t. On average, these individuals stayed in spatial unit i for LOS c i weeks during their visits based on the UNWTO's estimates. Applying Little's law originally developed in queueing theory [40] , we multiplied X k≠i v ðsÞ i→k;t ½k by LOS c i in each simulation s to obtain the estimated total number of people located in spatial unit i at Eweek t who did not belong to the local population of spatial unit i, which was then subtracted from POP i . Therefore, M ðsÞ i→ j;t ½i can be considered as a realisation of a random variable following a hypergeometric distribution, where v ðsÞ i→ j;t ½i random draws were obtained without replacement from a total number of Here, to ease the computation, we approximated M ðsÞ i→ j;t ½i by drawing from a binomial distribution instead. Note that we only included I ðsÞ i;t into the numerator of the success probability, since most autochthonous infections occurring prior to Eweek t had recovered by Eweek t, given a recovery rate of around 1/7-1/5 per day [11, 13] . In addition, the aforementioned I ðsÞ i;t individuals were presumably still able to travel because the vast majority were unreported and likely to be asymptomatic. Of the total (unobserved) number of incident autochthonous infections occurring in spatial unit i during Eweek t (denoted by L i, t ), M i → j, t [j] were visitors belonging to the local population of spatial unit j. Conditioning on L i, t , M i → j, t [j] followed a binomial distribution with the number of Bernoulli trials L i, t and a success probability as a function of i, j, and t. Since L i, t was reasonably large (i.e. greater than 20 in most cases), we modelled M i → j, t [j] as a Poisson random variable with mean parameter proportional to the total person-time at risk of the visitors from spatial unit j, and similarly for the number of autochthonous infections occurring in spatial unit i during Eweek t that belonged to the local population of spatial unit i (I i, t ). Hence, the ratio between their expected values was given by: Eweek t, the number of people located in spatial unit i was POP i , of whom ðv i→ j;t ½ j•LOS c i Þ were visitors from spatial unit j by Little's law [40] , and likewise to the local population of spatial unit i as derived earlier. In the denominator of equation ( * ), we included an additional factor (1 − η i, t ), which denoted the percentage of the local population of spatial unit i that were still susceptible to ZIKV infection in Eweek t (i.e. η i;t ¼ X t 0