key: cord-0693051-ry3m6x8c authors: Zhao, Shilei; Chen, Hua title: Modeling the Epidemic Dynamics and Control of COVID-19 Outbreak in China date: 2020-02-29 journal: nan DOI: 10.1101/2020.02.27.20028639 sha: 75e68869f9b65bca661e768402395d9dede6de2c doc_id: 693051 cord_uid: ry3m6x8c The coronavirus disease 2019 (COVID-19) is rapidly spreading over China and more than 30 countries in last two months. COVID-19 has multiple characteristics distinct from other infectious diseases, including a high infectivity during incubation, time delay between real dynamics and daily observed case numbers, and the effects from multiple quarantine and control measures. We develop a model SUQC to adequately characterizes the dynamics of COVID-19 and explicitly model the control by artificial measures, which is more suitable for analysis than other existing epidemic models. The SUQC model is applied to the daily released data of the confirmed infected to analyze the outbreak of COVID-19 in Wuhan, Hubei (excluding Wuhan), China (excluding Hubei) and four first-tier cities of China. We find that, before January 30, 2020, all these regions except Beijing have a reproductive number R>1, and after January 30, all regions have a reproductive number R<1, indicating the effectiveness of the quarantine and control measures in inhibiting COVID-19. The confirmation rate of Wuhan is 0.0643, significantly lower than 0.1914 of Hubei (excluding Wuhan) and 0.2189 of China (excluding Hubei), but increases to 0.3229 after Feb 12th when clinical diagnosis was adopted. The un-quarantined infected individuals in Wuhan on February 12, 2020 is as high as 3,509 and decreases to 334 on February 21th, 2020. After fitting the model with recent data, we predict that the end times of COVID-19 of Wuhan and Hubei are around late-March, of China (excluding Hubei) around mid-March, and of the four tier-one cities before March 2020. A total of 80,511 individuals of the whole country are infected, among which 49,510 are from Wuhan, 17,679 from Hubei(excluding Wuhan), and the rest 13,322 from other regions of China (excluding Hubei). We suggest the rigorous quarantine and control measures should be kept before March in Beijing, Shanghai, Guangzhou and Shenzhen, and before late-March in Hubei. The model can also be useful to predict the trend of epidemic and provide quantitative guide for other counties in a high risk of outbreak, such as South Korea, Japan and Iran. The outbreak of coronavirus disease 2019 (COVID-19) was initially identified in mid-December 2019 in Wuhan, China [1, 2] . The earliest patients in Wuhan are related to exposure from a seafood market. Later, the number of patients grows drastically due to human-to-human transmission [3] . The incubation period of COVID-19 is reported to be 3-7 days, at most 14 days, which varies greatly among patients [2] . The novel coronavirus is believed to be infectious during incubation period when no symptoms are shown on the patients [4] , an important characteristics differentiating COVID-19 from its close relative SARS. Considerable measures have been implemented to control the outbreak in Wuhan and China, mainly by quarantine to reduce transmission. On Jan 23rd, 2019, Wuhan restricted travel outside the city. Any person exposed to COVID-19 is required to perform a self-isolation for 14 days. Around Jan 25rd, 2019, nucleic acid kit was developed to diagnose the patients. On Feb 12th, 2020, clinical diagnosis was used to assist the confirmation of infection in Hubei province. Nonetheless, has spread to all provinces of China and more than 30 other countries in the last two months [5, 6] . COVID-19 has three features that make it hard to describe with the existing epidemic models including SIR, SEIR etc [7, 8] . Firstly, COVID-19 has a relatively long incubation period, which causes a time delay between real dynamic and the dailyobserved case numbers. Secondly, the epidemic trend heavily depends on multiple artificial factors, including local medical resources, quarantine measures, and the efficiency of confirmation approaches, which should be explicitly modeled. For example, the outbreak is more severe in Wuhan compared to other cities in China that constrains the medical resources, therefore the infected need a longer time to be confirmed and reported in the official released numbers. This potentially leads to a larger difference between real and reported infected cases in Wuhan than in other places. This could also explain why a sudden increase of confirmed infected cases was observed when clinical diagnosis was adopted in confirmation in Wuhan. Lastly, the quarantine measures are widely implemented, and the quarantined have a lower chance to infect the susceptible individuals. This is critical for controlling the spread across China. The characteristics of COVID-19 outbreak and control are distinct from existing infectious diseases, and the existing epidemic models cannot be applied to describe the observed data directly. We thus propose to use a simple SUQC model (Susceptible, Unquarantined infected, Quarantined infected, Confirmed infected). SUQC distinguishes the infected individuals to be un-quarantined, quarantined but not confirmed, and confirmed. Among the three types, the confirmed number is the data we can directly observe from the official released report. Only un-quarantined infected have ability to infect susceptible individuals and affect the development of the epidemic. In our proposed model, the quarantine rate parameter is used to quantify the strength of quarantine policy on the development of epidemics, and the confirmation rate parameter is used to measure the efficiency of confirmation based on the released data. The two parameters can be solved from fitting the observed confirmed cases over time. Note that the model contains only four variables and three parameters to model both artificial factors and characteristics of epidemics using the data we can directly observe. We expect that the simplified model will not over fit the data given the short time span, but will adequately characterize the essential dynamics. We apply the SUQC model to the daily released numbers of confirmed cases in Wuhan city, Hubei province (excluding Wuhan), China (excluding Hubei) and four first-tier cities: Beijing, Shanghai, Guangzhou and Shenzhen. The parameters of the model were inferred, and used to predict the future trends of epidemics in China. The data of confirmed infected numbers includes 33 consecutive daily records from Jan . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.27.20028639 doi: medRxiv preprint 20th, 2020 to Feb 21th, 2020 released by the National Health Commission of the People's Republic of China (see Table S1 ). The parameters in the model, such as the quarantine rate, are time varying, and thus we divided data into different stages (according to the changes of measures during the epidemic), and we assumed the parameters within each stage are relatively stable. We defined time before Jan 30, 2020 as stage Ⅰ, and after Jan 30th, 2020 as stage Ⅱ. To guarantee enough data points within the two stages, the start and end of the stages may vary by one or two days. Wuhan has recently undergone stricter measures of quarantine and transmission limiting, and clinical diagnosis was adopted after Feb 12th, we thus further did a stage III analysis of the dynamics in Wuhan using data after Feb 13th. Figure 1 shows the inference and prediction of epidemic dynamics of Wuhan using stage Ⅰ, Ⅱ and Ⅲ data respectively. The first 15 daily data points (from Jan 28th to Feb 11th) of stage Ⅱ were used to fit the model and infer parameters. The following 10 daily data points were used as test data for evaluating the performance of the model. In Figure 1 With the inferred parameters, we further plot the long-term predictions of the numbers of total infected ( I ), un-quarantined infected ( U ), quarantined infected ( Q ) and cumulative Confirmed infections ( C ) in Wuhan (Figure 1(A) ). The end time (increment of confirmed infections equals zero) is predicted to be 147 days from Jan 28th, 2020. The total number of infected individuals is 62,577 (Table 1) . We can do a similar analysis using stage Ⅰ data of Wuhan. The stage Ⅰ data is informative for predicting the epidemic trend assuming no rigorous quarantine and control measures. The first 10 daily data points (from Jan 20th) were used to infer the parameters. The rest data points were used to test the performance of the model ( Figure 1 (C-D)). As clearly seen in Figure 1 (C), the predicted numbers of ) (t C and ) (t I increase dramatically, which are far beyond the observed numbers after Jan 21. The number of total infected can be as large as 8,923,823, and the epidemic lasts for a much longer time (328 days, see Table 1 ). The dramatic difference between predictions from stage Ⅰ and stage Ⅱ data indicates the preventing measures and quarantines, such as travel restrict, are very efficient in controlling the outburst of the epidemic. Since more strict quarantine and traffic control measures were executed recently to inhibit the infection of COVID-19 in Wuhan and clinical diagnosis was adopted after Feb 12th, we also analyzed the stage Ⅲ data (from Feb 13th). The estimated quarantine rate is 0.6185, much higher than that of 0.3917 estimated based on stage Ⅱ data. The total number of infected individuals is estimated to be 49,510, indicating a further acceleration of the epidemic end (Table 1) . Similar analysis was accomplished on stage Ⅰ and Ⅱ data of Hubei province (excluding Wuhan), the whole country (excluding Hubei), and four tier-1 cities in China (Figures 2-3, S1-S4, Tables 1-2) . Overall, the model predictions are in high accuracy. We see similar trends in these regions: the predicted numbers of infected are distinct between results from the two stage data, indicating the necessity and efficiency of quarantine and control measures. We note that even with stage Ⅰ data, Beijing have a reproductive number smaller than 1 ( Figure S1 , Table 2 , 0.8840), indicating an earlystage prompt and effective response to COVID-19. Figures 2-3 , S1-S4, we notice that the difference between ) (t C and ) (t I in Wuhan is the biggest. Wuhan has the most infected individuals than any other places in China (more than 50 %), highly beyond the limit of local clinical resources, leading to a long waiting time for confirmation, and the lowest confirmation . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Using stage Ⅱ data, the estimated number of ) (t U individuals in Wuhan ( Figure 1 (A)) on Feb 12th is still as high as 3,509. The person-to-person transmission will last for more than two months to mid May 2020. However, estimated with stage Ⅲ data, the quarantine rate of Wuhan increases to 0.6185 and the un-quarantined infected individuals decreases to 334 on Feb 21th, 2020 (Figure 1(E) ). After fitting the model with the recent data from stage Ⅱ and stage Ⅲ (for Wuhan), we make a series of predictions about future dynamics of the COVID-19 outbreak in China. S is the same as that in existing infectious disease models, e.g. SIR and SEIR. , the number of infected and un-quarantined individuals that can be either presymptomatic or symptomatic. Different from E in the SEIR model, U are infectious, and can render a susceptible to be un-quarantined infected. , the number of quarantined infected individuals. The un-quarantined infected become Quarantined infected by isolation or hospitalization, and lose the ability of infecting the susceptible. , the number of confirmed infected cases. The number of confirmed infections is released by the official agency or media, which may be the only variable with observation that we can access. Note that C is usually smaller than the number of real infected individuals, due to the limited sensitivity of diagnosis methods. The duration of incubation can also cause a time delay of confirmation. Nevertheless, C is also the number useful for monitoring and predicting the trend of epidemic dynamics. Besides the aforementioned variables, we have a composite variable , representing the real cumulative number of infected individuals at time t . The limitation of detection methods and the medical resources can greatly delay the confirmation process, insomuch the confirmation proportion I C / is less than 1 and time-varying. R , the number of removed individuals, is not included in the model as in the SIR/SEIR models. Once the infected are quarantined, we assume their probability of infecting susceptible individuals is zero, and thus no matter the infected are recovered or not, they have no effect on the dynamics of the epidemic system.  is the infection rate, the mean number of new infected caused by an un-quarantined infected per day.  is the subsequent confirmation rate of those infected that are not confirmed by the conventional methods, but confirmed with some additional tests. If no other special approaches used,  is set to 0. Combing two sources of confirmation approaches is the total confirmation rate.  is the confirmation rate of the un-quarantined infected who can be identified as confirmed infections without being quarantined. We thus set up a set of ODE equations to model the dynamics of an infectious disease and the control by artificial factors (Eqn. 1). In the model, U goes directly to C , or go through Q indirectly. Actually, the former can be viewed as a special case of the later with zero delay time during C Q  . Thus we delete the direct way and simplified the model as Eqn. 2. From the above SUQC model, we can further define some biologically meaningful parameters, for monitoring and predicting the trend of disease: Some parameters can be calculated beforehand using the public data directly. We calculate the infection rate  using the confirmed infected numbers of Wuhan city during Jan 20th and Jan 27th. By fitting an exponential curve, we get 0.2967   . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The loss function is then optimized with the interior-point method implemented in the MATLAB function fmincon to infer the parameter values. We try different initial values in parameter optimization, and notice that the inferred parameter values are not sensitive to the provided initial values. Note that the loss function (Eqn 3) may give too much weight to later observations since the cumulative case numbers are higher than earlier days. We tried another two weighted loss functions to better integrate information across the whole epidemic (Eqns S1 and S2 in the Supplementary information), and compared the prediction of the loss functions. The prediction seems robust on the choice of loss functions (Figures S5 and S6 ). In practice the loss functions may be chosen by their performances evaluated on the test data. with stage Ⅰ data, the first 10 data points (from Jan 20th) are used to infer the parameters, and the remaining points are used to test the model; (E) prediction using stage Ⅲ data; (F) Model-fitting and testing with stage Ⅲ data. The first 7 data points (from Jan 23th) are used to infer the parameters, and the remaining points are used to test the model. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.27.20028639 doi: medRxiv preprint prediction using stage Ⅱ data; (B) model-fitting and testing with stage Ⅱ data. The first 15 data points (from Jan 28th) are used to infer the parameters, and the remaining points are used to test the model performance; (C) prediction using stage Ⅰ data; (D) model-fitting and testing with stage Ⅰ data. The first 10 data points (from Jan 20th) are used to infer the parameters, and the remaining points are used to test the model. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.27.20028639 doi: medRxiv preprint Model-fitting and testing with stage Ⅰ data. The first 10 data points (from Jan 20th) are used to optimize the parameters, and the remaining points are used to test the model. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.27.20028639 doi: medRxiv preprint Wuhan Municipal Health Commission briefing on the pneumonia epidemic situation 31 A novel coronavirus outbreak of global health concern A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: Lancet Infectious Disease Specialists. Prevention and control of novel coronavirus pneumonia (The Fifth Edition) Risk for transportation of 2019 novel coronavirus disease from Wuhan to other cities in China Emerg Infect Dis Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees Early Epidemic Dynamics of the West African Estimates Derived with a Simple Two-Parameter Model Transmissibility of 2019-n-CoV. Available at We are grateful to Drs. Yongbiao Xue and Liping Wang for motivating this project, to Dr. Hongyu Zhao and the anonymous reviewers for their valuable comments. This project was supported by the National Natural Science Foundation of China (Grant No. 31571370, 91631106, and 91731302), the ''Strategic Priority Research Program'' of the