key: cord-327096-m87tapjp authors: Peng, Liangrong; Yang, Wuyue; Zhang, Dongyan; Zhuge, Changjing; Hong, Liu title: Epidemic analysis of COVID-19 in China by dynamical modeling date: 2020-02-18 journal: nan DOI: 10.1101/2020.02.16.20023465 sha: doc_id: 327096 cord_uid: m87tapjp The outbreak of the novel coronavirus (2019-nCoV) epidemic has attracted world- wide attention. Herein, we propose a mathematical model to analyzes this epidemic, based on a dynamic mechanism that incorporating the intrinsic impact of hidden la- tent and infectious cases on the entire process of transmission. Meanwhile, this model is validated by data correlation analysis, predicting the recent public data, and back- tracking, as well as sensitivity analysis. The dynamical model reveals the impact of various measures on the key parameters of the epidemic. According to the public data of NHCs from 01/20 to 02/09, we predict the epidemic peak and possible end time for 5 different regions. The epidemic in Beijing and Shanghai, Mainland/Hubei and Hubei/Wuhan, are expected to end before the end of February, and before mid- March respectively. The model indicates that, the outbreak in Wuhan is predicted to be ended in the early April. As a result, more effective policies and more efforts on clinical research are demanded. Moreover, through the backtracking simulation, we infer that the outbreak of the epidemic in Mainland/Hubei, Hubei/Wuhan, and Wuhan can be dated back to the end of December 2019 or the beginning of January 2020. A novel coronavirus, formerly called 2019-nCoV, or SARS-CoV-2 by ICTV (severe acute respiratory syndrome coronavirus 2, by the International Committee on Taxonomy of Viruses) caused an outbreak of atypical pneumonia, now officially called COVID-19 by WHO (coronavirus disease 2019, by World Health Organization) first in Wuhan, Hubei province in Dec., 2019 and then rapidly spread out in the whole China 1 . As of 24:00 Feb. 13th, 2020 (Beijing Time), there are over 60, 000 reported cases (including more than 1, 000 death report) in China, among which, over 80% are from Hubei province and over 50% from Wuhan city, the capital of Hubei province 2,3 . The central government of China as well as all local governments, including Hubei, has tightened preventive measures to curb the spreading of COVID-19 since Jan. 2020. Many cities in Hubei province have been locked down and many measures, such as tracing close contacts, quarantining infected cases, promoting social consensus on self-protection like wearing face mask in public area, etc. However, until the finishing of this manuscript, the epidemic is still ongoing and the daily confirmed cases maintain at a high level. During this anti-epidemic battle, besides medical and biological research, theoretical studies based on either statistics or mathematical modeling may also play a non-negligible role in understanding the epidemic characteristics of the outbreak, in forecasting the inflection point and ending time, and in deciding the measures to curb the spreading. For this purpose, in the early stage many efforts have been devoted to estimate key epidemic parameters, such as the basic reproduction number, doubling time and serial interval, in which the statistics models are mainly used [4] [5] [6] [7] [8] [9] . Due to the limitation of detection methods and restricted diagnostic criteria, asymptomatic or mild patients are possibly excluded from the confirmed cases. To this end, some methods have been proposed to estimate untraced contacts 10 , undetected international cases 11 , or the actual infected cases in Wuhan and Hubei province based on statistics models 12 , or the epidemic outside Hubei province and overseas 6, [13] [14] [15] . With the improvement of clinic treatment of patients as well as more strict methods stepped up for containing the spread, many researchers investigate the effect of such changes by statistical reasoning 16, 17 and stochastic simulation 18, 19 . Compared with statistics methods 20,21 , mathematical modeling based on dynamical equations 15,22-24 receive relatively less attention, though they can provide more detailed 2 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint mechanism for the epidemic dynamics. Among them, the classical susceptible exposed infectious recovered model (SEIR) is the most widely adopted one for characterizing the epidemic of COVID-19 outbreak in both China and other countries 25 . Based on SEIR model, one can also assess the effectiveness of various measures since the outbreak 23, 24, [26] [27] [28] , which seems to be a difficult task for general statistics methods. SEIR model was also utilized to compare the effects of lock-down of Hubei province on the transmission dynamics in Wuhan and Beijing 29 . As the dynamical model can reach interpretable conclusions on the outbreak, a cascade of SEIR models are developed to simulate the processes of transmission from infection source, hosts, reservoir to human 30 . There are also notable generalizations of SEIR model for evaluation of the transmission risk and prediction of patient number, in which model, each group is divided into two subpopulations, the quarantined and unquarantined 23, 24 . The extension of classical SEIR model with delays 31,32 is another routine to simulate the incubation period and the period before recovery. However, due to the lack of official data and the change of diagnostic caliber in the early stage of the outbreak, most early published models were either too complicated to avoid the overfitting problem, or the parameters were estimated based on limited and less accurate data, resulting in questionable predictions. In this work, we carefully collect the epidemic data from the authoritative sources: the Such a design aims to minimize the influence of Hubei province and Wuhan city on the data set due to their extremely large infected populations compared to other regions. Without further specific mention, these conventions will be adopted thorough the whole paper. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint in progress. A. Generalized SEIR model {S(t), P (t), E(t), I(t), Q(t), R(t), D(t)} denoting at time t the respective number of the susceptible cases, insusceptible cases, exposed cases (infected but not yet be infectious, in a latent period), infectious cases (with infectious capacity and not yet be quarantined), quarantined cases (confirmed and infected), recovered cases and closed cases (or death). The adding of a new quarantined sate is driven by data, which together with the recovery state takes replace of the original R state in the classical SEIR model. Their relations are given in Fig. 1 and characterized by a group of ordinary differential equations (or difference equations if we consider discrete time, see SI). Constant N = S + P + E + I + Q + R + D is the total population in a certain region. The coefficients {α, β, γ −1 , δ −1 , λ(t), κ(t)} represent the protection rate, infection rate, average latent time, average quarantine time, cure rate, and mortality rate, separately. Especially, to take the improvement of public health into account, such as promoting wearing face masks, more effective contact tracing and more strict locking-down 4 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint of communities, we assume that the susceptible population is stably decreasing and thus introduce a positive protection rate α into the model. In this case, the basic reproduction It is noted that here we assume the cure rate λ and the mortality rate κ are both time dependent. As confirmed in Fig. 2a -d, the cure rate λ(t) is gradually increasing with the time, while the mortality rate κ(t) quickly decreases to less than 1% and becomes stabilized after Jan. 30th. This phenomenon is likely raised by the assistance of other emergency medical teams, the application of new drugs, etc. Furthermore, the average contact number of an infectious person is calculated in Fig. 2e-f and could provide some clue on the infection rate. It is clearly seen that the average contact number is basically stable over time, but shows a remarkable difference among various regions, which could be attributed to different quarantine policies and implements inside and outside Hubei (or Wuhan), since a less severe region is more likely to inquiry the close contacts of a confirmed case. A similar regional difference is observed for the severe condition rate too. In Fig. 2g -h, Hubei and Wuhan overall show a much higher severe condition rate than Shanghai. Although it is generally expected that the patients need a period of time to become infectious, to be quarantined, or to be recovered from illness, but we do not find a strong evidence for the necessity of including time delay (see SI for more details). As a result, the time-delayed equations are not considered in the current work for simplicity. According to the daily official reports of NHC of China, the cumulative numbers of quarantined cases, recovered cases and closed cases are available in public. However, since the latter two are directly related to the first one through the time dependent recovery rate and mortality rate, the numbers of quarantined cases Q(t) plays a key role in our modeling. A similar argument applies to the number of insusceptible cases too. Furthermore, as the accurate numbers of exposed cases and infectious cases are very hard to determine, they will be treated as hidden variables during the study. Leaving alone the time dependent parameters λ(t) and κ(t), there are four unknown coefficients {α, β, γ −1 , δ −1 } and two initial conditions {E 0 , I 0 } about the hidden variables (other initial conditions are known from the data) have to be extracted from the time series 5 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16 data {Q(t)}. Such an optimization problem could be solved automatically by using the simulating annealing algorithm (see SI for details). A major difficulty is how to overcome the overfitting problem. To this end, we firstly prefix the latent time γ −1 , which is generally estimated within several days 5, 33, 34 . And then for each fixed γ −1 , we explore its influence on other parameters (β = 1 nearly unchanged), initial values, as well as the population dynamics of quarantined cases and infected cases during best fitting. From Fig. 3a -b, to produce the same outcome, the protection rate α and the reciprocal of the quarantine time δ −1 are both decreasing with the latent time γ −1 , which is consistent with the fact that longer latent time requires longer quarantine time. Meanwhile, the initial values of exposed cases and infectious cases are increasing with the latent time. Since E 0 and I 0 include asymptomatic patients, they both should be larger than the number of quarantined cases. Furthermore, as the time period between the starting date of our simulation (Jan. 20th) and the initial outbreak of COVID-19 (generally believed to be earlier than Jan. 1st) is much longer than the latent time (3-6 days), E 0 and I 0 have to be close to each other, which makes only their sum E 0 +I 0 6 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint matters during the fitting. An additional important finding is that in all cases β is always very close to 1, which agrees with the observation that COVID-19 has an extremely strong infectious ability. Nearly every unprotected person will be infected after a direct contact with the COVID-19 patients 5,33,34 . As a summary, we conclude that once the latent time γ −1 is fixed, the fitting accuracy on the time series data {Q(t)} basically depends on the values of α, δ −1 and E 0 + I 0 . And based on a reasonable estimation on the total number of infected cases (see Fig. 3c-d) , the latent time is finally determined as 2 days. In order to further evaluate the influence of other fitting parameters on the long-term forecast, we perform sensitivity analysis on the data of Wuhan (results for other regions are similar and not shown) by systematically varying the values of unknown coefficients 35, 36 . As shown in Fig. 3e-f , the predicted total infected cases at the end of epidemic, as well as the the inflection point, at which the basic reproduction number is less than 1 6 , both show a positive correlation with the infection rate β and the quarantined time δ −1 and a negative correlation with the protection rate α. These facts agree with the common sense and highlight the necessity of self-protection (increase α and decrease β), timely disinfection (increase α and decrease β), early quarantine (decrease δ −1 ), etc. An exception is found for the initial total infected cases. Although a larger value of E 0 + I 0 could substantially increase the final total infected cases, it shows no impact on the inflection point, which could be learnt from the formula of basic reproduction number. We apply our pre-described generalized SEIR model to interpret the public data on the cumulative numbers of quarantined cases, recovered cases and closed cases from Jan. 20th to Feb. 9th, which are published daily by NHC of China since Jan. 20th. Our preliminary study includes five different regions, i.e. the Mainland * , Hubei * , Wuhan, Beijing and Shanghai. Through extensive simulations, the optimal values for unknown model parameters and 7 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16 initial conditions, which best explain the observed cumulative numbers of quarantined cases, recovered cases and closed cases (see Fig. 4 ), are determined and summarized in Table 1 . There are several remarkable facts could be immediately learnt from Table 1 . Firstly, the protection rate of Wuhan is significantly lower than other regions, showing many infected cases may not yet be well quarantined until Feb. 9th (the smaller α for Wuhan does not necessarily mean people in Wuhan pay less attention to self-protection, but more likely due to the higher mixing ratio of susceptible cases with infectious cases). Similarly, although the average protection rate for Hubei * is higher than that of Wuhan, it is still significantly lower than other regions. Secondly, the quarantine time for Beijing and Shanghai are the 8 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint shortest, that for Mainland * is in between. Again, the quarantine time for Wuhan and Hubei * are the longest. Finally, the estimated number of total infected cases on Jan. 20th in five regions are all significantly larger than one, suggesting the COVID-19 has already spread out nationwide at that moment. We will come back to this point in the next part. the initial values for exposed cases and infectious cases separately. The time-dependent cure rate λ(t) and mortality rate κ(t) can be read out from Fig. 2 and are given in SI. Most importantly, with the model and parameters in hand, we can carry out simulations for a longer time and forecast the potential tendency of the COVID-19 epidemic. In Fig. 4 and Fig. 5a -b, the predicted cumulative number of quarantined cases and the current number of exposed cases plus infectious cases are plotted for next 30 days as well as for a shorter period of next 13 days. Official published data by NHC of China from Feb. 10th to 15th are marked in red spots and taken as a direct validation. Overall, except Wuhan, the validation data show a well agreement with our forecast and all fall into the 95% confidence interval (shaded area). And we are delighted to see most of them are lower than our predictions, showing the nationwide anti-epidemic measures in China come into play. While for Wuhan city (and also Hubei province), due to the inclusion of suspected cases with clinical diagnosis into confirmed cases (12364 cases for Wuhan and 968 cases for Hubei * on Feb. 12th) announced by NHC of China since Feb. 12th during the preparation of our manuscript, there is a sudden jump in the quarantined cases. Although it to some 9 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint extent offsets our original overestimates, it also reveals the current severe situation in Wuhan city, which requires much closer attention in the future. Towards the epidemic of COVID-19, our basic predictions are summarized as follows: 1. Based on optimistic estimation, the epidemic of COVID-19 in Beijing and Shanghai would soon be ended within two weeks (since Feb. 15th). While for most parts of mainland, the success of anti-epidemic will be no later than the middle of March. The situation in Wuhan is still very severe, at least based on public data until Feb. 15th. We expect it will end up at the beginning of April. are not included into parameter estimation). By coincidence, on the same day, we witnessed a sudden jump in the number of confirmed cases due to a relaxed diagnosis caliber, meaning more suspected cases will receive better medical care and have much lower chances to spread virus. Besides, Wuhan local government announced the completion of community survey on all confirmed cases, suspected cases and close contacts in the whole city. Besides the forecast, the early trajectory of the COVID-19 outbreak is also critical for our understanding on its epidemic as well as future prevention. To this end, by adopting the shooting method, we carry out inverse inference to explore the early epidemic dynamics 10 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint of COVID-19 since its onset in Mainland * , Hubei * , and Wuhan (Beijing and Shanghai are not considered due to their too small numbers of infected cases on Jan. 20th). With respect to the parameters and initial conditions listed in Table 1 , we make an astonishing finding that, for all three cases, the outbreaks of COVID-19 all point to 20-25 days before Jan. 20th (the starting date for public data and our modeling). It means the epidemic of COVID-19 in these regions is no later than Jan. 1st (see Fig. 5d ), in agreement with reports by Li et al. 5, 33, 34 . And in this stage (from Jan. 1st to Jan. 20th), the number of total infected cases follows a nice exponential curve with the doubling time around 2 days. This in some way explains why statistics studies with either exponential functions or logistic models could work very well on early limited data points. Furthermore, we notice the number of infected cases based on inverse inference is much larger than the reported confirmed cases in Wuhan city before Jan. 20th. In this study, we propose a generalized SEIR model to analyze the epidemic of COVID-19, which was firstly reported in Wuhan last December and then quickly spread out nationwide in China. Our model properly incorporates the intrinsic impact of hidden exposed and infectious cases on the entire procedure of epidemic, which is difficult for traditional statistics analysis. A new quarantined state, together with the recovery state, takes replace of the original R state in the classical SEIR model and correctly accounts for the daily reported confirmed infected cases and recovered cases. Based on detailed analysis of the public data of NHC of China from Jan. 20th to Feb. 9th, we estimate several key parameters for COVID-19, like the latent time, the quarantine time and the basic reproduction number in a relatively reliable way, and predict the inflection point, possible ending time and final total infected cases for Hubei, Wuhan, Beijing, Shanghai, etc. Overall, the epidemic situations for Beijing and Shanghai are optimistic, which are expected to end up within two weeks (from Feb. 15th, 2020). Meanwhile, for most parts of mainland including the majority of cities in Hubei province, it will be no later than the middle of March. We should also point out that the situation in Wuhan city is still very severe. More effective policies and more efforts on medical care and clinical research are eagerly needed. We expect the final success of anti-epidemic will be reached at the beginning 11 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint of this April. Furthermore, by inverse inference, we find that the outbreak of this epidemic in Mainland, Hubei, and Wuhan can all be dated back to 20-25 days ago with respect to Jan. 20th, in other words the end of Dec. 2019, which is consistent with public reports. Although we lack the knowledge on the first infected case, our inverse inference may still be helpful for understanding the epidemic of COVID-19 and preventing similar virus in the future. The authors declare no conflict of interest. Epidemic doubling time of the 2019 novel coronavirus outbreak by province in mainland china. medRxiv Epidemiological and clinical features of the 2019 novel coronavirus outbreak in china Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov) in china The novel coronavirus, 2019-ncov, is highly contagious and more infectious than initially estimated. medRxiv Serial interval of novel coronavirus (2019-ncov) infections. medRxiv Assessing spread risk of wuhan novel coronavirus within and beyond china All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the Using predicted imports of 2019-ncov cases to determine locations that may not be identifying all imported cases. medRxiv Epidemic size of novel coronavirusinfected pneumonia in the epicenter wuhan: using data of five-countries' evacuation action. medRxiv Estimating the daily trend in the size of covid-19 infected population in wuhan. medRxiv Estimation of the asymptomatic ratio of novel coronavirus (2019-ncov) infections among passengers on evacuation flights Early dynamics of transmission and control of 2019-ncov: a mathematical modelling study. medRxiv The effect of travel restrictions on the spread of the 2019 novel coronavirus (2019-ncov) outbreak. medRxiv The impact of traffic isolation in wuhan on the spread of 2019-ncov. medRxiv Feasibility of controlling 2019-ncov outbreaks by isolation of cases and contacts Effectiveness of airport screening at detecting travellers infected with 2019-ncov. medRxiv Predictions of 2019-ncov transmission ending via comprehensive methods All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the A data driven time-dependent transmission rate for tracking an epidemic: a case study of 2019-ncov Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions. medRxiv Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions An updated estimation of the risk of transmission of the novel coronavirus (2019-ncov). Infectious Disease Modelling Transmission dynamics of 2019-ncov in malaysia. medRxiv Lockdown may partially halt the spread of 2019 novel coronavirus in hubei province Interventions targeting air travellers early in the pandemic may delay local outbreaks of sars-cov-2. medRxiv Simulating the infected population and spread trend of 2019-ncov under different policy by eir model. medRxiv The lockdown of hubei province causing different transmission dynamics of the novel coronavirus (2019-ncov) in wuhan and beijing. medRxiv Jing-An Cui, and Ling Yin. A mathematical model for simulating the transmission of wuhan novel coronavirus. bioRxiv A time delay dynamical model for outbreak of 2019-ncov and the parameter identification Modeling and prediction for the trend of outbreak of ncp based on a time-delay dynamic system All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the Partial equilibrium approximations in apoptosis. ii. the death-inducing signaling complex subsystem Chiu Fan Lee, and Ya Jing Huang. Statistical Mechanics and Kinetics of Amyloid Fibrillation We acknowledged the financial supports from the National Natural Science Foundation All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16 author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16 17 All rights reserved. No reuse allowed without permission.author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint 18 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02. 16.20023465 doi: medRxiv preprint