key: cord-0997598-9j5yda3k authors: Fu, Xinmiao; Ying, Qi; Zeng, Tieyong; Long, Tao; Wang, Yan title: Simulating and Forecasting the Cumulative Confirmed Cases of SARS-CoV-2 in China by Boltzmann Function-based Regression Analyses date: 2020-02-18 journal: nan DOI: 10.1101/2020.02.16.20023564 sha: b117016a887f05386acc4387dab5bf8c769cf90e doc_id: 997598 cord_uid: 9j5yda3k An ongoing outbreak of atypical pneumonia caused by the 2019 novel coronavirus (SARS-CoV-2) is hitting Wuhan City and has spread to other provinces/cities of China and overseas. It very urgent to forecast the future course of the outbreak. Here, we provide an estimate of the potential total number of confirmed cases in mainland China by applying Boltzmann-function based regression analyses. We found that the cumulative number of confirmed cases from Jan 21 to Feb 14, 2020 for mainland China, Hubei Province, Wuhan City and other provinces were all well fitted with the Boltzmann function (R2 being close to 0.999). The potential total number of confirmed cases in the above geographic regions were estimated at 95% confidence interval (CI) as 79589 (71576, 93855), 64817 (58223, 77895), 46562 (40812, 57678) and 13956 (12748, 16092), respectively. Notably, our results suggest that the number of daily new confirmed cases of SARS-CoV-2 in mainland China (including Hubei Province) will become minimal between Feb 28 and Mar 10, 2020, with 95% CI. In addition, we found that the data of cumulative confirmed cases of 2003 SARS-CoV in China and Worldwide were also well fitted to the Boltzmann function. To our knowledge this is the first study revealing that the Boltzmann function is suitable to simulate epidemics. The estimated potential total number of confirmed cases and key dates for the SARS-CoV-2 outbreak may provide certain guidance for governments, organizations and citizens to optimize preparedness and response efforts. as well as in many foreign countries or regions including Japan, the Republic of Korea, Canada, 52 USA, and European countries [2] [3] [4] . This SARS-CoV-2 outbreak was declared as a public health 53 emergency of international concern by the World Health Organization (WHO) on Jan 30 [5] . Much research progress has been made in dissecting the evolution and origin of SARS-CoV-2 [6-56 8], as well as characterizing its clinical features [9] [10] [11] [12] [13] [14] [15] and epidemics [16] [17] [18] [19] in the past one and 57 half months . These efforts would significantly guide us to contain the SARS-CoV-2 epidemic. 58 While the outbreak is on-going, people raise grave concerns about the future trajectory of the 59 outbreak, especially given that the working and schooling time has been already dramatically 60 postponed after the Chinese Lunar New Year holiday was over (scheduled on Jan 31). It is highly 61 demanding to estimate the potential total number of confirmed cases, both nationally and locally. Here we present Boltzmann function-based regression analyses on the data of confirmed cases of 80 Data were organized in Microsoft Excel and then incorporated into Microcal Origin software 81 (note: 2021 Jan 21 was set as day 1 and so on). The Boltzmann function was applied to data 82 simulation for each set of data regarding different geographic regions (e.g.., China, Hubei 83 Province and so on) and parameters of each function were obtained, with the potential total 84 number of confirmed cases being directly given by parameter A2. Estimation of critical dates was 85 performed by predicting the cumulative number of confirmed cases in the coming days post Feb 86 14, 2020, and the key dates were provisionally set when the number of daily new confirmed cases 87 is lower than 0.1% of the potential total number. The Boltzmann function for simulation is 88 expressed as follows: where C(x) is the cumulative number of confirmed cases at day x; A1, A2, x0, and dx are constants. In particular, A2 represents the estimated potential total number of confirmed cases of SARS-92 CoV-2. Details of derivation of the Boltzmann function for epidemic analysis are described in the 93 supporting information file. 95 A Monte Carlo technique is applied to assess the uncertainty in the estimated total number of 96 confirmed cases due to the uncertainty in the reported number cases. 1000 non-linear regressions 97 were performed with the same time series data but each data point in the time series was perturbed 98 by multiplying with a random scaling factor that represents the relative uncertainty. We assumed 99 that the relative uncertainty follows a single-sided normal distribution with a mean of 1.0 and a 100 standard deviation of 10%. This implies that all reported cases are positive but there is a tendency 101 to miss-reporting some positive cases so that the reported numbers represent a lower limit. The 102 resulting mean and 95% confidence interval (CI) were presented. In light of daily reported cases of SARS-CoV-2 since Jan 21, 2020, we decided to collect data for 107 analysis on the cumulative number of confirmed cases (initially from Jan 21 to Feb 10, 2020) in from Jan 21 to Feb 14 for Hubei Province by a fixed factor (refer to Table S1), assuming that 121 these newly added cases were linearly accumulative in those days. It is the same forth with the 122 data for Wuhan City. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 Fitting data on the confirmed cases of SARS-CoV-2 to Boltzmann function and estimating 124 the potential total number of confirmed cases 125 Fitting analyses using Boltzmann function indicate that all sets of data were well fitted with the 126 function (all R 2 values being close to 0.999; Figs. 1, 2 and S1). Parameter A2 in the Boltzmann 127 function directly represents the potential total number of confirmed cases (refer to equation 1). 128 As summarized in Table 1 , the potential total number of confirmed cases for mainland China, 129 Hubei Province, Wuhan City, and other provinces were estimated as 72800±600, 59300±600, 130 42100±700 and 12800±100; respectively (also refer to Fig. 1) ; those for the six mostly influenced 131 provinces (Guangdong, Zhejiang, Henan, Hunan, Anhui and Jiangxi) were 1300±10, 1170±10, 132 1260±10, 1050±10, 1020±10 and 940±10, respectively (also refer to Fig. 2) ; those for the top-4 133 major cities (Beijing, Shanghai, Guangzhou and Shenzhen) were 394±4, 328±3, 337±3 and 397±4, 134 respectively (also refer to Fig. S1 ). In addition, we estimated the key date, on which the number of daily new confirmed cases is 137 lower than 0.1% of the potential total number of confirmed cases as defined by us subjectively. As summarized in Table 1 , the key dates for mainland China, Hubei Province, Wuhan City and 139 other provinces are Feb28 or Feb 27. It appears that it will take approximately two weeks for 140 mainland China to reach this state such that the number of daily new confirmed cases of SARS- CoV-2 post the critical date is below 70. The above analyses were performed assuming that the released data on the confirmed cases are 144 precise. However, there is a tendency to miss-report some positive cases such that the reported 145 numbers represent a lower limit. One typical example indicating this uncertainty is the sudden 146 increase of more than 13 000 new confirmed cases in Hubei province on Feb 12 after clinical 147 features were officially accepted as a standard for infection confirmation. Another uncertainty 148 might result from insufficient kits for viral nucleic acid detection at the early stage of the outbreak. 149 We thus examined the effects of the uncertainty of the released data on the estimation of the 150 potential total number of confirmed cases using a Monte Carlo method (for detail, refer to the 151 Methods section). For simplicity, we assumed that the relative uncertainty of the reported data 152 follows a single-sided normal distribution with a mean of 1.0 and a standard deviation of 10%. Under the above conditions, the potential total numbers of confirmed cases of SARS-CoV-2 for 155 different regions were estimated (Figs. 3, S2 and S3) and summarized in Table 1 Such uncertainty analysis also allowed us to estimate the key dates at 95% CI. As summarized in 165 Table 1 , the key dates for mainland China, Hubei Province, Wuhan City, and other provinces 166 would fall in (2/28, 3/10), (2/27, 3/10), (2/28, 3/10) and (2/27, 3/13), respectively (also refer to . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 Data on the confirmed cases of 2013 SARS-CoV were well fitted to Boltzmann function 173 The ongoing SARS-CoV-2 outbreak has undoubtedly caused the memories of the SARS-CoV (Figs. 1, 2 and S1 ). More importantly, the data of 2003 SARS-CoV in China 203 and worldwide were also well fitted to the function (Fig. 4) . These results, in conjunction with 204 that Boltzmann function can be inferred from a few assumptions (for detail, refer to the Methods 205 section of the support information file), suggest that Boltzmann function is suitable for analyzing 206 the epidemics of coronaviruses like SARS-CoV and SARS-CoV-2. One advantage of this model 207 is that parameter A2 directly gives an estimate of the potential total numbers of confirmed cases. In addition, unlike traditional epidemiological models that require much more detailed data for indicating that the SARS-CoV-2 outbreak in China might not be as bad as thought. Notably, our 220 results also suggest that the number of daily new confirmed cases will become minimal between 221 Feb 28 and Mar 10 in mainland China (including Hubei Province) at 95% CI (Fig. 3A) . This trend, 222 if occur as predicted, may help citizens in China to release stress and anxiety, as there have been 223 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.02.16.20023564 doi: medRxiv preprint many provinces and/cities in China that have suspended public transportation systems and even 224 implemented house quarantines for all urban households [20] . In further support of these 225 estimates by the Boltzmann function, the newly released cumulative number of confirmed cases 226 in all the above geographic regions on Feb 15 and Feb 16 are very close to the predicted ones 227 (refer to Table S2 ). Consistently, parameters of the established Boltzmann functions by 228 regression analyses of the data from Jan 21 to Feb 16, 2020 (as presented in Fig. S4 ) are highly 229 similar to those made by the data from Feb 21 to Feb 14, 2020. Nevertheless, our estimates based on the established Boltzmann functions are not absolutely 232 guaranteed, mainly because of the uncertainty of the reported data (Figs. 3, S2 and S3) . We 233 estimated the potential total numbers (refer to Table 1) under the assumption that the relative 234 uncertainty of the reported data follows a single-sided normal distribution with a mean of 1.0 and 235 a standard deviation of 10%, and this deviation may be underestimated. If the real uncertainty of 236 released data by health commissions is larger than 10%, the potential total numbers of confirmed 237 cases would accordingly increase, and the key dates will be postponed. Another limitation is that 238 this estimate is based on the assumption that the overall conditions are not changing. This might 239 not be true, given that in many regions the workers have started to return for work half a month Ethical approval or individual consent was not applicable. 256 All data and materials used in this work were publicly available, and also available based on request. 258 The funding agencies had no role in the design and conduct of the study; collection, management, analysis, 260 and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the 261 manuscript for publication. 262 The authors declare that they have no conflict of interest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 cases of Hubei Province and Wuhan City were re-adjusted for data fitting due to the suddenly added cased 333 determined by clinical features (for detail, refer to the Results section and Table S1 ). 334 335 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure 4 A B . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 A novel coronavirus from patients with pneumonia in China What to do next to control the 2019-nCoV epidemic? The Novel Coronavirus Originating in Wuhan, China: Challenges for 277 Global Health Governance A novel coronavirus outbreak of global health concern What next for the coronavirus response? Lancet, 2020 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus 282 origins and receptor binding A new coronavirus associated with human respiratory disease in China A pneumonia outbreak associated with a new coronavirus of probable bat origin A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating 288 person-to-person transmission: a study of a family cluster Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus 291 pneumonia in Wuhan, China: a descriptive study Clinical characteristics of 2019 novel coronavirus infection in China Clinical features of patients infected with 2019 novel coronavirus in Wuhan Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) 301 in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak A data driven time-dependent transmission rate for tracking an epidemic: a case 304 study of 2019-nCoV DATA-BASED ANALYSIS, MODELLING AND FORECASTING OF THE NOVEL 306 CORONAVIRUS (2019-NCOV) OUTBREAK Estimating the potential total number of novel Coronavirus (2019-308 nCoV) cases in Wuhan City Nowcasting and forecasting the potential domestic and international 310 spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Offline: 2019-nCoV-"A desperate plea