key: cord-0793996-q14x0i2c authors: Cao, Zhidong; Zhang, Qingpeng; Lu, Xin; Pfeiffer, Dirk; Wang, Lei; Song, Hongbing; Pei, Tao; Jia, Zhongwei; Zeng, Daniel Dajun title: Incorporating Human Movement Data to Improve Epidemiological Estimates for 2019-nCoV date: 2020-02-09 journal: nan DOI: 10.1101/2020.02.07.20021071 sha: ac35eeb6eaaf382fe0e123719f7dc409046e5d87 doc_id: 793996 cord_uid: q14x0i2c Estimating the key epidemiological features of the novel coronavirus (2019-nCoV) epidemic proves to be challenging, given incompleteness and delays in early data reporting, in particular, the severe under-reporting bias in the epicenter, Wuhan, Hubei Province, China. As a result, the current literature reports widely varying estimates. We developed an alternative geo-stratified debiasing estimation framework by incorporating human mobility with case reporting data in three stratified zones, i.e., Wuhan, Hubei Province excluding Wuhan, and mainland China excluding Hubei. We estimated the latent infection ratio to be around 0.12% (18,556 people) and the basic reproduction number to be 3.24 in Wuhan before the city's lockdown on January 23, 2020. The findings based on this debiasing framework have important implications to prioritization of control and prevention efforts. . A geo-stratified debiasing estimation framework. The latent infection ratio was estimated from the number of people traveling from Wuhan to other destinations of mainland China. before the lockdown, and the confirmed 2019-nCoV case reporting data in these destinations. The estimation of the latent infection ratio enabled additional epidemiological modeling work such as calculating the 2019-nCoV's basic reproduction number and inferring the actual size of the epidemic in Wuhan. To tackle this major challenge, we developed a geo-stratified debiasing estimation framework based on the following observation. On the contrary to Wuhan, other places in China have ample screening and treatment resources in relative terms, and started various stringent surveillance programs after the Wuhan lockdown, to screen and monitor people who travelled from Wuhan to these places. As such, the chance of under-screening and underreporting is much lower outside Wuhan, especially outside Hubei Province. Our approach stratifies mainland China into three zones, i.e., Wuhan, Hubei Province excluding Wuhan, and mainland China excluding Hubei (Fig. 1) . Incorporating data capturing the size of the population-level movement from Wuhan to the destinations in other zones with the official confirmed 2019-nCoV case counts in these destinations, we obtained a reliable estimate of the latent infection ratio, defined as the proportion of L&I persons among the population, in Wuhan before the lockdown. In turn, this estimate enabled us to derive the basic reproductive number, the number of L&I persons in Wuhan before and after the lockdown, and assess the progression of the epidemic nationally. Fig. 2A shows the relationship between the number of people traveling from Wuhan during the period of January 16, 2020 to January 22, 2020, and the number of confirmed cases in 12 representative provinces and municipalities during the period of January 23, 2020 to January 29, 2020. The lag between these two time intervals was introduced to accommodate the commonlyassumed 7-day incubation period for 2019-nCoV (3). These representative provinces and municipalities are those that had received at least 30,000 travelers from Wuhan within a threeweek window starting on January 1, 2020 and ending on January 22, 2020, prior to the lockdown. Similarly, Fig. 2B presents the same relationship for 14 cities within Hubei Province. These cities had received at least 100,000 travelers from Wuhan. From Fig. 2A and Fig. 2B , we observe a statistically significant positive linear relationship between the number of travelers from Wuhan and the cumulative case count in most provinces and cities. For two provinces, Zhejiang and Jiangsu (in Fig. 2A) , and two Hubei cities, Xiaogan and Huanggang (in Fig. 2B ), we observe superlinear growth in the cumulative case counts, explainable by the known significant secondary local transmissions. Fig. 2C presents zone-level aggregated results. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint Under our proposed geo-stratified debiasing framework, the latent infection ratio among the people traveling from Wuhan to other destinations in mainland China excluding Hubei within a three-week window prior to the lockdown, is estimated as 0.12% (95% CI: 0.09%-0.15%). Based on this latent infection ratio, we inferred that there were in total 18,556 (95% CI: 14,134-22,978) L&I persons in Wuhan before the lockdown, under the assumption that this latent infection ratio remained the same for people traveling from and remaining in Wuhan. Among these L&I persons, 10,887 (95% CI: 8,292-13,481) stayed in Wuhan on January 23, 2020, when the lockdown was initiated, and the remaining 7,669 (95% CI: 5,842-9,497) traveled to other parts of mainland China. Among the traveling L&I persons, 4,644-7,549 went to other places in Hubei Province, and 1,198 to 1,947 to places outside Hubei. We followed a similar geo-stratified debiasing process, relying on more reliable case counts outside of Hubei to estimate the basic reproduction number R0 for 2019-nCoV. Our R0 estimate is 3.24, indicating that on average one infected case is expected to generate 3.24 secondary infected cases prior to isolation or death. This value is within the range of that of the 2003 SARS epidemic (2-5) (8), and higher than that of influenza (2-3) (9) and Ebola (1.5-2.5) (10). Compared to the WHO's estimate (1.4-2.5) and other recently reported estimates (2.2-3.1) of 2019-nCoV (1, 3-5), our estimate is at the high end. This can be partially explained by the fact that most published estimates are based on all confirmed 2019-nCoV cases, including the Wuhan case counts, prone to the under-screening and underreporting biases. When considering the possibility of a large number of infected and infectious individuals who will not develop clinical disease, the actual value of R0 could be even higher. Building on the estimated latent infection ratio, R0, and the number of L&I persons in Wuhan when the lockdown started, we developed a customized epidemiological model to simulate the progression of the 2019-nCoV epidemic after the lockdown of Wuhan (see supplementary materials for details). Given the stringent nature of the lockdown measures and the continuing underreporting bias in Wuhan, the characteristics of the epidemic within and outside Wuhan were treated differently. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint This report introduces a geo-stratified approach for estimating the latent infection ratio in Wuhan with the presence of severe under-screening and underreporting biases. The estimated latent infection ratio enabled us to conduct further epidemiological investigations to better understand the ongoing 2019-nCoV epidemic. Our findings have important implications with respect to the development of 2019-nCoV control and prevention policies in mainland China. First, the previous estimates of R0 are likely to have resulted in an underestimation of the transmissibility of 2019-nCoV, mainly due to the under-reporting bias during the early phase in Wuhan. Second, the actual epidemic size in Wuhan and Hubei is larger than what has so far been reported based on confirmed case data. Third, for parts of Mainland China outside Hubei the risk of further outbreaks is likely to be limited, as long as locally occurring cases of secondary and tertiary transmissions can be detected and isolated effectively. The recent quarantine lockdown of Wenzhou was in response to detection of large numbers of secondary transmissions. It is absolute key for being able to prevent this epidemic from becoming a pandemic that cases of clinical disease are detected and isolated as early as possible. In addition, all high-risk contacts need to be traced to identify secondary transmissions. The public health surveillance system in China is facing tremendous challenge give the population size and sensitivity of the system is to be enhanced. It is therefore paramount that available resources are used optimally. An understanding of the spatial distribution of the L&I persons is critical in this context, since they are the reservoir of clinical cases and of secondary transmission, although to a lesser extent than clinical cases. The annual migration around the Lunar New Year compounded the situation in that it immensely increased movements across the country. The human movement data All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint utilized in this study provides a robust data source for estimating the likely dissemination of 2019-nCoV cases across the country (Fig. 3B) , providing important epidemiological information that is much less affected by delayed-or under-reporting bias. The management of the epidemic over the coming weeks when people travel back from their hometowns to their work places will be critical for being able to bring it under control. We recommend that the national and local authorities process real-time human movement data through mathematical models to forecast the spatio-temporal dynamics of further 2019-nCoV outbreaks across Mainland China. This will allow developing early detection, isolation and quarantine strategies tailored to the very dynamic epidemiological situation, as well as identify potential problems in the implementation of local disease control policies and inform the design of any necessary adjustments. The daily counts of confirmed 2019-nCoV cases were collected from the National Health Commission of the People's Republic of China through its website accessible at http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml. Additional information about these confirmed cases was retrieved from the websites of the official provincial and municipal health commissions throughout mainland China. A listing of these websites is provided by the National Health Commission of People's Republic of China at http://www.nhc.gov.cn/xcs/yqfkdt/gzbd_index.shtml. The human movement data used in the study consists of aggregated statistics provided by the two largest telecommunications operators in China, covering the daily counts of the number of mobile phone users traveling from Wuhan and reaching other destinations outside of Wuhan in mainland China, between January 1, 2020 and January 22, 2020. Given the very high mobile phone user penetration rate in China and the 80% market share held by these two operators, these statistics are expected to be representative of the actual size of the human migration originating from Wuhan. Note that some users may have returned to Wuhan before traveling outbound again. These users were counted more than once in the human movement dataset. Due to the sensitive nature of the telecommunications data records, data processing and aggregation was conducted in-house in the telecommunications operators' secure computing environment by their own staff, without any participation from the author team. The applicable law of "Provisions on Protection of Personal Information of Telecommunications and Internet Users (Mainland China)" (11), and the guidelines from the Global System for Mobile Communications Association (GSMA) on the protection of privacy in the use of mobile phone data, were followed (12) . The authors have access to only exported data aggregated at the provincial and municipal level. No personal information was processed in the analysis of this study. This study was approved by the Biomedical Research Ethics Review Board of Chinese Academy of Sciences Institute of Automation (approval #IA-202001). Among all mainland China provinces and municipalities, we first chose those to which there were at least 30,000 people traveling from Wuhan between January 1, 2020 and January 22, 2020. From this selected list of 12 provinces and municipalities, Zhejiang Province and Jiangsu Province were excluded from our latent infection ratio estimation because of known substantial secondary local transmissions, given that the first objective of our study is to estimate the latent infection ratio in Wuhan during the early stage of the epidemic. For the remaining 10 provinces and municipalities, to accommodate the commonly-assumed 7-day incubation period for 2019-nCoV, we calculated the ratio between the count of confirmed cases between January 23, 2020 and January 29, 2020, and the number of people travelling from Wuhan during the period of January 16, 2020 and January 22, 2020. Table S1 presents these counts and the calculated ratios. Seven provinces and municipalities offered detailed information concerning secondary local transmissions when publishing daily case counts. Table S2 presents the cumulative data for these places. The estimated latent infection ratio (0.12%) as reported in the main manuscript was calculated from the data reported in Tables S1 and S2, following the estimation model presented below. Let " denote the number of cumulative confirmed cases in province , and " the number of people traveling from Wuhan to province . Mainland China has 31 provinces. Given the available data from the 10 selected provinces/municipalities, the latent infection ratio, the proportion of latently infected and undiagnosed infectious (L&I) persons in all people traveling from Wuhan, , were calculated following the standard approach for ratio estimation (13), as follows. where denotes the slope of the fitted linear relationship of ( " , " ), and the average proportion of non-secondary cases (see Table S2 ), estimated around 0.776. The resulting estimated latent infection ratio is 0.12% (95% CI: 0.09%-0.15%). Using this ratio, we in turn estimated that there were 18,556 (95% CI: 14,134-22,978) L&I persons (out of 15.34 million population) in Wuhan before the city's lockdown. Among these L&I persons, 7,670 traveled to other destinations in Mainland China while the rest 10,886 remained in Wuhan. The standard SEIR model assumes four categories of persons: Susceptible, Exposed (corresponding to the latently infected cases in the main manuscript), Infectious, and Recovered. To investigate the 2019-nCoV epidemic, we adopted a the SEIRDC model (14) , which enhances the standard SEIR model by introducing an additional category of persons, Dead, and an auxiliary variable ( ) that tracks the cumulative number of infectious persons during the outbreak. Furthermore, to take into account human movements, we customized the SEIRDC model for two epidemic areas: Wuhan and non-Wuhan (mainland China excluding Wuhan). The model is as follows. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint where ( ), ( ), ( ), ( ), and ( ) denote the number of susceptible, exposed, infectious, recovered and dead persons, respectively, in Wuhan. ( ) denotes the total population size. The variables with subscript indicate that they are for the non-Wuhan area. Variable ( ) denotes the migration rate, the proportion of people moving from Wuhan to the non-Wuhan area. In the formulation, 1/ is the incubation period (7 days), 1/ the infectious period (9 days), and = /(1 − ) is the death rate, which is estimated by the fatality rate 1.5%. These values were adopted from the WHO documents (3). We set December 4, 2019 as the starting date of the outbreak. By adopting the epidemiological characteristics reported in (1), we inferred the parameters for this customized SEIRDC model to fit our estimated number of L&I persons in Wuhan before the city's lockdown. Following (4), the basic reproduction number @ = 3.24 was obtained by fitting the growth of the confirmed case counts in mainland China excluding Hubei by January 22, 2020. The model-predicted epidemic size of 2019-nCoV is presented in Fig. 3 in the main manuscript. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint Table S1 . The number of people traveling from Wuhan (N), the cumulative number of confirmed cases (C), and the latent infection ratio ( ) as of January 29, 2020. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.07.20021071 doi: medRxiv preprint A novel coronavirus outbreak of global health concern World Health Organization Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Uncertainty in SARS epidemiology Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Transmissibility of 1918 pandemic influenza References 11. Ministry of Industry and Information Technology of the People's Republic of China. Regulations on the protection of personal information of telecommunications and Internet users GSMA guidelines on the protection of privacy in the use of mobile phone data for responding to the Ebola outbreak Density ratio estimation in machine learning Comparative estimation of the reproduction number for pandemic influenza from daily case notification data The authors thank J. Yang, L. Li, H. Zhou, Y. Ye and K. Tang who helped Table S2 . The cumulative number of confirmed cases as of January 29, 2020, with information concerning secondary transmission for the seven provinces and municipalities providing such information.