key: cord-0938622-ndnkhj42 authors: batista, milan title: Estimation of the final size of the coronavirus epidemic by the logistic model date: 2020-02-18 journal: nan DOI: 10.1101/2020.02.16.20023606 sha: 8bd8158e4a47c65b12a462b49b4b8b4af1488dc3 doc_id: 938622 cord_uid: ndnkhj42 In the note, the logistic growth regression model is used for the estimation of the final size and its peak time of the coronavirus epidemic. Based on available data estimation the final size will be about 90 000 cases and the peak time was on 10 Feb 2020. The logistic growth model originates from population dynamics (Haberman 1998) . The underlying assumption of the model is that the rate of change in the number of new cases per capita linearly decreases with the number of cases. Hence, if C is the number of cases, and t is the time, then the model is expressed as where r is infection rate, and K is the final epidemic size. If   0 is the initial number of cases, then the solution of (1) is At this time, the number of cases and growth rate are Now, if 1 2 , , , n C C C  are the number of cases at times 1 2 , , , n t t t  , then the final size predictions of the epidemic based on these data are 1 2 , , , n K K K  . By using Shanks transformation, the predicted final epidemic size is For the practical calculation of the parameters K and r, we use the MATLAB functions lsqcurvefit and fitnlm. The model equations are . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101 //doi.org/10. /2020 where t is time,   S t is the number of susceptible persons at time t, is the number of infected persons at time t,   R t is the number of recovered persons in time t,  is the contact rate, and 1  is the average infectious period. From (1), (2), and (3) we obtain the total population size, N. The initial conditions are   0 Eliminating I from (1) and (3) yields In the limit t  , the number of susceptible people left, S  , is where R  is the final number of recovered persons. As the final number of infected people is zero, we have, using (4), From this and (6), the equation for R  is To use the model, we must estimate the model parameters  ,  , and the initial values 0 S and 0 I from the available data (we set 0 0 R  and 0 0 I C  ). Now the available data is a time series of the total number of cases C, i.e., . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 /2020 We can estimate the parameters and initial values by minimizing the difference between the actual and predicted number of cases, i.e., by minimizing The results of logistic regression and the SIR model simulation are given in Tables 1 and 2, respectively. The comparison of the predicted final sizes is shown in the graph in Figure 1 . We see that both methods converge and with more data, the discrepancy between the predicted values becomes less than 5%. From Table 1 , we see that the peak of the epidemic was probably on 9 Feb, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 /2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.02.16.20023606 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https: //doi.org/10.1101 //doi.org/10. /2020 In Figure 2 , the time evaluation of the cases is shown, where we can see a good agreement between the models and the actual data. From Table 3 , we see that the logistic regression model has a high coefficient of determination of 0.996, while the pvalue (< 0.000) indicates that all the regression parameters are statistically significant. In Tables 3 and 4 , the iterated Shanks transformations for the predicted series of the final epidemic size are given. It appears that the predictions of the logistic model tend to the final size of 83231 cases, while the SIR model predictions converge to 83640 cases. Thus, the discrepancy is less than 0.5%. Table 4 . Iterated Shanks transformation for SIR model . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 /2020 The models used are data-driven, so they are as reliable as data are. Namely, as can be seen from the graph in Figure 2 at the beginning, we have exponential growth. Then until 11 Feb, one can predict the final epidemic size of about 55000 cases. However, the collection of data changes and we have a jump of about 15000 new cases on 12. Feb. On 20 Feb we have another change in trend; the data begin to shows almost linear trend (See Fig 3) . While the above models show that the epidemic is slow down, the linear trend predicts about 873 new cases per day (see Table 5 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 /2020 On the basis of the available data, we can now predict that the final size of the coronavirus epidemic using the logistic model will be approximately 83700   1300  cases and that the peak of the epidemic was on 9 Feb 2020. A more optimistic final size of 83300 cases is obtained using the Shanks transformation. Similar figures are obtained using the SIR model, where the predicted size of the epidemic is approximately 84500, and the Shanks transformation lowers this number to about 83700 cases. Naturally, the degree of accuracy of these estimates remains to be seen. In conclusion, qualitatively, both models show that the epidemic is moderating, but recent data show a linear upward trend. The next few days will, therefore, indicate in which direction the epidemic is heading. PS. Today it is more or less clear that the predictions of the article apply only to China. By February 20, 99% of the case was from China. The linear trend in data from Feb 20 onward meant a decreasing number of infected in China and increasing infected elsewhere in the world. In other words, in China, the epidemic is slowing down, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 however, it is now developing elsewhere in the world. We note that the forecasting methods used in this article are inapplicable in the early stages of an epidemic. Data-Based Analysis, Modelling and Forecasting of the novel Coronavirus (2019-nCoV) outbreak Advanced mathematical methods for scientists and engineers I asymptotic methods and perturbation theory Early estimates of epidemic final sizes The Final Size of a Serious Epidemic West Africa Approaching a Catastrophic Phase or is the 2014 Ebola Epidemic Slowing Down? Different Models Yield Different Answers for Liberia Computing applications to differential equations modelling in the physical and social sciences Early Epidemic Dynamics of the West African 2014 Ebola Outbreak: Estimates Derived with a Simple Two-Parameter Model Mathematical models mechanical vibrations, population dynamics, and traffic flow an introduction to applied mathematics. Unabridged republication ed, Classics in applied mathematics The Mathematics of Infectious Diseases A note on the derivation of epidemic final sizes Mathematical biology Statistics based predictions of coronavirus 2019-nCoV spreading in mainland China Using phenomenological models for forecasting the 2015 Ebola challenge Short-term Forecasts of the COVID-19 Epidemic in Guangdong and Zhejiang, China Non-linear Transformations of Divergent and Slowly Convergent Sequences Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Simulating the infected population and spread trend of 2019-nCov under different policy by EIR model