key: cord-0727316-t2x4yfrv authors: Sun, Haoxuan; Qiu, Yumou; Yan, Han; Huang, Yaxuan; Zhu, Yuru; Chen, Song Xi title: Tracking and Predicting COVID-19 Epidemic in China Mainland date: 2020-02-20 journal: nan DOI: 10.1101/2020.02.17.20024257 sha: 341fe593ea84b66223d50924bddbba252c37bc4e doc_id: 727316 cord_uid: t2x4yfrv By proposing a varying coefficient Susceptible-Infected-Removal model (vSIR), we track the epidemic of COVID-19 in 30 provinces in China and 15 cities in Hubei province, the epicenter of the outbreak. It is found that the spread of COVID-19 has been significantly slowing down within the two weeks from January 27 to February 10th with 87.0% and 84.3% reductions in the reproduction number R0 among the 30 provinces and 15 Hubei cities, respectively. This suggests the extreme control measures implemented since January 23, which include cutting off Wuhan and many other cities and towns, a great public awareness and high level of self isolation at home, have contributed to a substantial decline in the reproductivity of the COVID-19 in China. We predict that Hubei province will reach its peak between February 20 and 22, 2020, and if the removal rate can be increased to 0.1, the epidemic outside Hubei province will end in May 2020, and inside Hubei in early June. infectious disease outbreaks. See [4, 5, 6, 7] for statistical estimation and 25 inference for stochastic versions of the SIR model. SEIR models have been 26 used to produce early results on 9, 10] , which produced 27 the first three estimates of the all important basic reproduction number R 0 : 28 2.68 by [8] , 3.81 by [9] and 6.47 by [10] . The R 0 is the expected number 29 of infections by one infectious person during the course of his/her infectious 30 period, and is a key measure of an epidemic. If R 0 < 1, the epidemic will die 31 down eventually with the speed of the decline depends on how small R 0 is; 32 otherwise, the epidemic will explode until it runs out of its course. 33 The SEIR models that was employed in the above three cited works for the By applying the vSIR model, we produce daily estimates of the infectious 52 rate β(t) and the reproduction number R 0 (t) (t denotes time) for 30 provinces 53 and 15 major cities (including Wuhan) in Hubei province from January 21 54 or a later date between January 24-29 depending on the first confirmed case 55 to February 10. We report standard error in the parentheses following the 56 estimate. is the daily infection rate at t. We 105 do not adopt the version involving γ, the removal rate, since its estimation 106 4 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 is highly volatile at the early stage of an epidemic. A general version of 107 R 0 (t) may be defined as . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 Qinghai. Even for those exceptional provinces, the recent trend is largely de-144 clining. The non-monotone pattern for non-Hubei provinces were largely due 145 to relative small number of infected cases and waves of introduced infections. Table 1 provides the reproduction number R D 0 at the three durations 166 on February 10th. It shows that 5 provinces and 4 Hubei cities' R 14 0 were 167 significantly above 1 (at 5% significance level). There are 17 provinces and 168 8 Hubei cities' R 14 0 were significantly below 1, which were 4 and 6 more 169 than those a day earlier on February 9th, and 9 and 8 more than those on 170 February 8th, respectively. If we use the shorter D = 10.5, 27 provinces 171 and 11 Hubei cities have been significantly below 1 for 1-7 consecutive days. These indicate that the reproduction number R 0 has showed signs of crossing 173 below the critical threshold 1 in increasing number of provinces and cities 174 in Hubei around February 8-10. An updated Table 1 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . tively; see Table A2 in SI for the detailed prediction error. The prediction 208 based on the most recent data to February 16 gives similar results. From Table 2 , with the estimated recovery rate, Hubei will reach peak in- It should be highlighted that the above prediction results were based on 223 the estimated recovery rate so far. Table 3 where Wuhan is the capital city. We did not consider data from Tibet due 248 to very small number of cases. Table A3 in SI provides the starting dates CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.17.20024257 doi: medRxiv preprint case, and the analysis date started four days afterward due to the estimation 254 approach for estimating the infectious rate β(t). The latest start for analysis 255 was January 29th for Qinghai province and three cities in Hubei province. The second last date was January 28th that started 2 provinces and 5 Hubei 257 cities. The data from Shenzhen Government Online are epidemic statistics re- where β(t) and γ(t) are unknown functions of time. The rationale for using a time-varying β(t) function, rather than a con- . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101 //doi.org/10. /2020 The removal rate will also change over time as treatments improve over time. However, our analysis ( Figure S4 in SI) shows γ(t) is much slowly changing 285 for most of the provinces, which led us to treat γ(t) = γ at current stage of 286 the outbreak. with 7/10 weight at t = 1 or T , and 3/10 for t = 2 or T − 1. Apply the same 294 filtering on the recovered process R(t) and obtainR(t). To simply the nota-295 tion, we denote the filtered dataĪ(t) andR(t) as I(t) and R(t) respectively, 296 wherever there is no confusion. Let ∆ δ R t = R t+δ − R t for t = 1, · · · , T − δ. From the third equation in (1), we estimate γ by least square fitting of ∆ δ R t on I(t) without intercept. We estimate β(t) − γ by a local linear regression on log{I(t)}. Letγ and β(t) − γ be the estimators, and Var(γ) and Var(β(t) − γ) be their estimated variances. Their close form expressions are provided in Section S.1 in SI. Then,β(t) = β(t) − γ +γ is the estimate for the varying coefficient β(t) in (1). The standard error ofβ(t) can be obtained as SE β (t) = { Var(β(t) − γ) + Var(γ)} 1/2 . The 95% confidence interval for β(t) can be constructed as (β(t) − 1.96SE β (t),β(t) + 1.96SE β (t)). (2) In the implementation, we chose δ = 2 and w = 5. Figure S1 in SI shows As R D 0 (t) = β(t) × D, predicting β(t) is equivalent to predicting R D 0 (t). From Figure 1 and Figure S2 in SI, we see that the overall trends of β(t) is decreasing. But the rate of deceasing gets smaller as time travels. To model such trend, we consider the reciprocal regression . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101 //doi.org/10. /2020 with error e t and unknown parameters a, b and η. The parameters a, b 302 and η are estimated by minimizing the sum-of-square distance between the 303 estimatesβ(t) and their fitted values. Letã,b andη be the estimated 304 parameters, andβ(t) =b/(tη −ã) be the fitted function. Figure S3 in SI 305 shows the reciprocal model fitsβ(t) quite well for most of the provinces, 306 especially those with large number of infected cases. With the fittedβ(t), we project {S(t), I(t), R(t)} via the ODEs whereγ T is the estimated recovery rate at time T using the last five days' . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101 //doi.org/10. /2020 CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . Table 1 : The reproduction number R D 0 at three infectious durations: D = 7, 10.5, 14, for the 30 mainland provinces and 15 cities in Hubei province on February 10th. The symbols + (−) indicate that the R 14 0 was significantly above (below) 1 at 5% level of statistical significance, and the numbers inside the square brackets were the consecutive days the R 14 0 were above or below 1. The columns headed with ∆R 0 , ∆R 0 (1 st ) and ∆R 0 (2 nd ) are the percentages of decline in the R 14 0 from the beginning of analysis to February 10th, to February 3rd, and the from February 3-10, respectively. CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 Table 2 : The 95% prediction intervals for the peak and ending times, and the final accumulative number of infected cases of COVID-19 epidemic in the 30 provinces based on data to Feb 13 2020 with the estimatedγ T . The last column lists the total infected cases (I(t) + R(t)) as Feb 13, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 . The grey vertical line indicates the current date of observation; the blue solid line plots the observed I(t) before Feb 13th; the blue dashed line gives the predicted I(t) with 95% prediction interval (blue shaded area) with the estimatedγ T ; the pink vertical line indicates the peak date of I(t); the red dashed line gives the predicted I(t) with 95% prediction interval (red shaded area) with fixed recovery rate γ = 0.1. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 The 95% prediction intervals for the peak and ending times, and the final accumulative number of infected cases of COVID-19 epidemic in the 30 provinces based on data to