key: cord-304165-4f84pc83 authors: Zhuang, Zian; Cao, Peihua; Zhao, Shi; Lou, Yijun; Wang, Weiming; Yang, Shu; Yang, Lin; He, Daihai title: Estimation of local novel coronavirus (COVID-19) cases in Wuhan, China from off-site reported cases and population flow data from different sources date: 2020-03-02 journal: nan DOI: 10.1101/2020.03.02.20030080 sha: doc_id: 304165 cord_uid: 4f84pc83 Backgrounds: In December 2019, a novel coronavirus (COVID-19) pneumonia hit Wuhan, Hubei Province, China and spread to the rest of China and overseas. The emergence of this virus coincided with the Spring Festival Travel Rush in China. It is possible to estimate total number of cases of COVID-19 in Wuhan, by 23 January 2020, given the cases reported in other cities and population flow data between cities. Methods: We built a model to estimate the total number of cases in Wuhan by 23 January 2020, based on the number of cases detected outside Wuhan city in China, with the assumption that if the same screening effort used in other cities applied in Wuhan. We employed population flow data from different sources between Wuhan and other cities/regions by 23 January 2020. The number of total cases was determined by the maximum log likelihood estimation. Findings: From overall cities/regions data, we predicted 1326 (95% CI: 1177, 1484), 1151 (95% CI: 1018, 1292) and 5277 (95% CI: 4732, 5859) as total cases in Wuhan by 23 January 2020, based on different source of data from Changjiang Daily newspaper, Tencent, and Baidu. From separate cities/regions data, we estimated 1059 (95% CI: 918, 1209), 5214 (95% CI: 4659, 5808) as total cases in Wuhan in Wuhan by 23 January 2020, based on different sources of population flow data from Tencent and Baidu. Conclusion: Sources of population follow data and methods impact the estimates of local cases in Wuhan before city lock down. Keyword: COVID-19; mobility; pneumonia; transportation; outbreaks . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In December 2019, a cluster of patients with pneumonia of unknown causes were reported in Wuhan, Hubei Province, China [1] . On 7 January 2020, a novel coronavirus, named COVID-19, was identified as the cause of this outbreak [2] . This COVID-19 virus shares the common characteristics of coronavirus and is expected to become more virulent when establishing efficient human-to-human transmission [3] . The emergence of this virus coincided with the Spring Festival Travel Rush in China. It was estimated that there would be around 3 billion trips made in China during the period of 10 January to 18 February 2020 [4] . Some researchers have pointed out the risk of regional and global disease spreading during the Spring Festival Travel Rush [5] . However, due to the small number of severe patients reported by mid-January and most cases were linked to the Huanan . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . Seafood Market of Wuhan city, neither international nor regional travelling restrictions were implemented to Wuhan at the early stage of this outbreak. On 13 January 2020, the first exported case from Wuhan was reported in Thailand and the case numbers dramatically increased after the diagnostic kits became available in mid-January. As of 23 February 2020, there were 77042 laboratory confirmed cases and 2445 deaths (46.93% and 75.91% in Wuhan) [6] . In recognition of a wide-spreading outbreak, the government has suspended all public transportations inside Wuhan city since 23 January 2020, and some regional travelling restrictions were also implemented by other cities [3] . At early stage of this outbreak, the cases might have been seriously underreported due to the lack of diagnostic kits and insufficient screening for all suspected cases. Several efforts have been made to estimate the case numbers using different modelling approaches, and the estimates range from 1732 to 4000 during the period of 17-20 January [7, 8, 9] . In this study, we aimed to estimate the number of COVID-19 cases in Wuhan, utilizing the cases exported to other large cities of mainland China and different sources of the population flow data between Wuhan and these cities. The estimates were made by 23 January 2020 (before the suspension of public transportations in Wuhan). We assumed that the exported cases were less likely underreported, as stringent temperature screening was implemented at airports and railway stations. We compare these estimates to daily numbers of confirmed cases exported from Wuhan in surveillance data to evaluate the extent of underreporting. We obtained daily number of inbound and outbound domestic passengers travelling by air, train or road to/from Wuhan from three data sources: . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.02.20030080 doi: medRxiv preprint 2) Baidu map database (see: https://qianxi.baidu.com/). According to location data of Baidu's mobile software users, population flow number from 1 to 20 January 2020 was generated, between Wuhan and twenty-seven cities/regions (Anhui, Beijing, Chongqing, Fujian, Gansu, Guangdong, Guangxi, Guizhou, Hainan, Hebei, Heilongjiang, Henan, Hunan, Hubei(Outside Wuhan), Jiangsu, Jiangxi, Jilin, Liaoning, Ningxia, Shandong, Shanghai, Shanxi, Shannxi, Sichuan, Tianjin, Yunnan, Zhejiang). 3) The news platform. It has reported that there were 4.1 million outbound passengers from Wuhan in the first ten days of the Spring Festival Travel Rush (10-19 January 2020) via railways, highways and airways [10] . We equally divided the data to get an average daily population flow. As shown in Fig 1, we also collected daily numbers of exported cases from Wuhan to other cities in China, and all secondary cases of family or hospital clusters were excluded from analysis [11] . Eight cases from Guangdong were excluded due to the lack of traveling history to Hubei prior to illness onset. As for rest 371 cases that not specified as secondary case, we assume that the probability of a single case being an imported case is θ , and each case is independent from each other. Then all of these unspecified cases follow a binomial distribution (n, θ ), where θ represents the probability that a case is exported from Wuhan. Since the most cases detected outside Wuhan are imported cases, by 23 January 2020 [12] , we estimated daily numbers of imported COVID-19 cases from Wuhan based on different level of probability θ (1,0.9,0.8), see Table S1 in appendix. Binomial distribution [7], as in Eqn (1), where n is total number of cases and p is probability of finding any cases overseas. The probability p can be derived from dividing daily outbound international passengers of Wuhan by the population size that the Wuhan airport serves and multiplying by the average time of case discovery, see Eqn (2). Similarly, we used exported cases from Wuhan to predict numbers of local imported COVID-19 cases. We assumed a population of 19 million (catchment population) travelling through the airport, railway stations and highways in Wuhan, and a 10-day delay on average, which accounted for the . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint Apart from predict number of COVID-19 cases through overall data, we also tried to forecast number of COVID-19 cases base on separate city/region's data since more details may convey more information. A parameter λ was set as total COVID-19 cases. Based on the data obtained from each city/region, we estimated the λ , which is the most likely to make these results appear by calculating maximum of log-likelihood. In Equation (3), is the total log-likelihood. The k represents the total number of cities/regions calculated. The n, p represents the number of cases reported overseas and probability of finding any cases overseas respectively. After obtaining λ , since residuals of log maximum likelihood estimation follows Chi-square distribution [13] , 95% confidence intervals (95% CI) of log-likelihood, l, can be calculated. Then we can extrapolate a 95% CI about COVID-19 cases. In addition, we analyze the correlation between the two data sources. We found that Pearson As for separate cities/regions data, based on different data sources, we summed the total population flow from Wuhan to other cities/regions and the total number of cases in those cities. Then we estimate total number and 95% confidence interval (95% CI) about COVID-19 cases. We predicted 1326 (95% CI: 1177, 1484), 1151 (95% CI: 1018, 1292) and 5277 (95% CI: 4732, 5859) as total cases in Wuhan by 23 January, based on different source of data, see Table 1 . As for separate cities/regions data, we estimated total number of cases, λ , by using the maximum log-likelihood estimation (MLE) approach based on the population flow data from Tencent and Baidu, see Fig 2 (a,b) . Then we estimated 95% confidence interval (95% CI) about COVID-19 cases. We predicted 1059 (95% CI: 918, 1209), 5214 (95% CI: 4659, 5808) as total cases in Wuhan by 23 January 2020, based on different sources of population flow data, see Table 2 . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint [7, 9] . This is partly because the screening effort targeting population from Wuhan in other cities is much more effective than the local screening effort in Wuhan due to the worsen situation. Based on three different sources of population flow data, we generated the total COVID-19 cases prediction closest to the official report through Tencent data. Baidu Data provided closest result to the forecast by Imai et al. [8] . Meanwhile, in sensitivity analysis, Table 1 and Table 2 show that slight fluctuations of probability that a case is exported from Wuhan will have little impact on the forecast. Estimates of the population outflow provided by news, Baidu and Tencent show substantial fluctuation, resulting in wide predictions. We found that Baidu and Tencent data show significant linear relation, which verified each other that the general pattern of data is reasonable. After simple linear transformation, two sets of data become similar (see Fig 3) . further expanding the scope of epidemic monitoring, the gap between the estimated number and official reported cases will be further narrowed. According to our results, statistics of population flow also play significant roles in prediction. At present, many researches use data from Baidu and . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.02.20030080 doi: medRxiv preprint Tencent platforms [9, 15, 16] . It is not clear which data source can be used to make the most accurate prediction yet, while it is possible for forecasts based on a single data source to be inaccurate. To sum up, it is necessary to make a comprehensive analysis based on different statistics before reaching any conclusions. Population flow data were used in estimating the size of the epidemic of COVID-19 in Wuhan, China before the lock down. We found that population flow data from different sources may lead to significantly different estimates, although all estimates suggested much larger sizes than officially reported. Using cases reported off-site to estimate the size (course) of the epidemic in the epicenter is a common technique and will be used in future epidemics, especially when the epicenter lacks medical resources thus cases are under-reported. We argue that reliability of estimates of population flow data should be taken into consideration. The ethical approval or individual consent was not applicable. All data and materials used in this work were publicly available. Not applicable. Disclaimer . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.02.20030080 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.02.20030080 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.02.20030080 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. A Novel Coronavirus from Patients with Pneumonia in China A novel coronavirus outbreak of global health concern. The Lancet In 2020, the national passenger volume of Spring Festival transportation will reach about 3 billion person times Pneumonia of unknown etiology in Wuhan, China: potential for international spread via commercial air travel Real time epidemic data', released on 23 February 2020 by estimation of epidemiological parameters and epidemic predictions Nowcasting and forecasting the Wuhan 2019-nCoV outbreak Preprint published by the School of Public Health of the University of Hong Kong Wuhan has sent more than 4 million passengers by railways, highways and airlines and 80 million passengers by public transport Situation report of the pneumonia cases caused by the novel coronavirus by the National Health Commission of Each Province of People's Republic of China Risk for Transportation of 2019 Novel Coronavirus (COVID-19) from Wuhan to Cities in China The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics Report of Hubei Provincial Health Committee on pneumonia caused by new coronavirus' released on 24 January 2020 by Health Commission of Hubei Province Population movement, city closure and spatial transmission of the 2019-nCoV infection in China The impact of traffic isolation in Wuhan on the spread of 2019-nCov author/funder, who has granted medRxiv a license to display the preprint in perpetuity . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.