key: cord-1020932-pvb5c4ih authors: Liu, Zhihua; magal, pierre; Seydi, Ousmane; Webb, Glenn title: Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data date: 2020-03-13 journal: nan DOI: 10.1101/2020.03.11.20034314 sha: f5adadd5b7db2309ee5ef42e09ac078937524e4e doc_id: 1020932 cord_uid: pvb5c4ih We model the COVID-19 coronavirus epidemic in China. We use early reported case data to predict the cumulative number of reported cases to a final size. The key features of our model are the timing of implementation of major public policies restricting social movement, the identification and isolation of unreported cases, and the impact of asymptomatic infectious cases. Many mathematical models of the COVID-19 coronavirus epidemic in China have been developed, and some of these are listed in our references [4, 7, 9, 10, 11, 12, 13, 14, 15] . We develop here a model describing this epidemic, focused on the effects of the Chinese government imposed public policies designed to contain this epidemic, and the number of reported and unreported cases that have occurred. Our model here is based on our model of this epidemic in [5] , which was focused on the early phase of this epidemic (January 20 through January 29) in the city of Wuhan, the epicenter of the early outbreak. During this early phase, the cumulative number of daily reported cases grew exponentially. In [5] , we identified a constant transmission rate corresponding to this exponential growth rate of the cumulative reported cases, during this early phase in Wuhan. On January 23, 2020, the Chinese government imposed major public restrictions on the population of Wuhan. Soon after, the epidemic in Wuhan passed beyond the early exponential growth phase, to a phase with slowing growth. In this work, we assume that these major government measures caused the transmission rate to change from a constant rate to a time dependent exponentially decreasing rate. We identify this exponentially decreasing transmission rate based on reported case data after January 29. We then extend our model of the epidemic to the central region of China, where most cases occurred. Within just a few days after January 29, our model can be used to project the time-line of the model forward in time, with increasing accuracy, to a final size. The model consists of the following system of ordinary differential equations: − νI(t), R (t) = ν 1 I(t) − ηR(t), U (t) = ν 2 I(t) − ηU (t). (2.1) This system is supplemented by initial data S(t 0 ) = S 0 > 0, I(t 0 ) = I 0 > 0, R(t 0 ) = 0 and U (t 0 ) = U 0 ≥ 0. (2.2) Here t ≥ t 0 is time in days, t 0 is the beginning date of the model of the epidemic, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported symptomatic infectious individuals at time t, and U (t) is the number of unreported symptomatic infectious individuals at time t. Asymptomatic infectious individuals I(t) are infectious for an average period of 1/ν days. Reported symptomatic individuals R(t) are infectious for an average period of 1/η days, as are unreported symptomatic individuals U (t). We assume that reported symptomatic infectious individuals R(t) are reported and isolated immediately, and cause no further infections. The asymptomatic individuals I(t) can also be viewed as having a low-level symptomatic state. All infections are acquired from either I(t) or U (t) individuals. The parameters of the model are listed in Table 1 and a schematic diagram of the model is given in Figure 1 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . We use cumulative reported data from the National Health Commission of the People's Republic of China and the Chinese CDC for mainland China. Before February 11, the data was based on confirmed testing. From February 11 to February 15, the data included cases that were not tested for the virus, but were clinically diagnosed based on medical imaging showing signs of pneumonia. There were 17,409 such cases from February 10 to February 15. The data from February 10 to February 15 specified both types of reported cases. From February 16, the data did not separate the two types of reporting, but reported the sum of both types. We subtracted 17,409 cases from the cumulative reported cases after February 15 to obtain the cumulative reported cases based only on confirmed testing after February 15. The data is given in Table 2 We plot the data for daily reported cases and the cumulative reported cases in Figure 2 . We assume f = 0.8, which means that 20% of symptomatic infectious cases go unreported. We assume η = 1/7, which means that the average period of infectiousness of both unreported symptomatic infectious individuals and reported symptomatic infectious individuals is 7 days. We assume ν = 1/7, is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . which means that the average period of infectiousness of asymptomatic infectious individuals is 7 days. These values can be modified as further epidemiological information becomes known. A illustrated in our previous work [5] , we assume that in the early phase of the epidemic, the cumulative number of reported cases grow approximately exponentially, according to the formula: with values χ 1 = 0.15, χ 2 = 0.38, χ 3 = 1.0. These values of χ 1 , χ 2 , and χ 3 are fitted to reported case data from January 19 to January 28. We assumed the initial value S 0 = 11, 000, 000, the population of the city Wuhan, which was the epicenter of the epidemic outbreak where almost all cases in China occurred in this time period. The other initial conditions are The value of the transmission rate τ (t), during the early phase of the epidemic, when the cumulative number of reported cases was approximately exponential, is the constant value The initial time is The value of the basic reproductive number is These parameter formulas were derived in [5] . After January 23, strong government measures in all of China, such as isolation, quarantine, and public closings, strongly impacted the transmission of new cases. The actual effects of these measures were complex, and we use an exponential decrease for the transmission rate τ (t) to incorporate these effects after the early exponentially increasing phase. The formula for τ (t) during the exponential decreasing phase was derived by a fitting procedure. The formula for τ (t) is The date N and the value of µ are chosen so that the cumulative reported cases in the numerical simulation of the epidemic aligns with the cumulative reported case data during a period of time after January 19. We choose N = 25 (January 25) for our simulations. We illustrate τ (t) in Figure 3 , with µ = 0.16. In this way we are able to project forward the time-path of the epidemic after the government imposed public restrictions, as it unfolds. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . https://doi.org/10.1101/2020.03. 11.20034314 doi: medRxiv preprint We assume that exponentially increasing phase of the epidemic (as incorporated in τ 0 ) is intrinsic to the population of any subregion of China, after it is has been established in the epidemic epicenter Wuhan. We also assume that the susceptible population S(t) in not significantly reduced over the course of the epidemic. We set τ 0 = 4.51 × 10 −8 , t 0 = 5.0, I(t 0 ) = 3.3, U (t 0 ) = 0.18, and R(t 0 ) = 1.0, as in Section 4. We set S(t 0 ) in (2.2) to 1, 400, 050, 000 (the population of mainland China). We set τ (t) in where S 0 = 11, 000, 000 (the population of Wuhan). We thus assume that the government imposed restriction measures became effective in reducing transmission on January 25. In Figure 4 , we plot the graphs of CR(t), CU (t), R(t), and U (t) from the numerical simulation for simulations based on six time intervals for known values of the cumulative reported case data. For each of these time intervals, a value of µ is chosen so that the simulation for that time interval aligns with the cumulative reported case data in that interval. In this way, we are able to predict the future values of the epidemic from early cumulative reported case data. In Figure 5 we plot the graphs of the reported cases R(t), the unreported cases U (t), and the infectious pre-symptomatic cases I(t). The blue dots are obtained from the reported cases data ( Table 2) is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . https://doi.org/10.1101/2020.03.11.20034314 doi: medRxiv preprint day beginning on January 26, by subtracting from each day, the value of the reported cases one week earlier. Our model transmission rate τ (t) can be modified to illustrate the effects of an earlier or later implementation of the major public policy interventions that occurred in this epidemic. The implementation one week earlier (25 is replaced by 18 in (4.1)) is graphed in Figure 6A . All other parameters and the initial conditions remain the same. The total reported cases is approximately 4, 500 and the total unreported cases is approximately 1, 100. The implementation one week later (25 is replaced by 32 in (4.1)) is graphed in Figure 6B . The total reported cases is approximately 820, 000 and the total unreported cases is approximately 200, 000. The timing of the institution of major social restrictions is critically important in mitigating the epidemic. The number of unreported cases is of major importance in understanding the evolution of an epidemic, and involves great difficulty in their estimation. The data from January 19 to February 15 for reported cases in Table 2 , was only for confirmed tested cases. Between February 11 and February 15, additional clinically diagnosed case data, based on medical imaging showing signs of pneumonia, was also reported by the Chinese CDC. Since February 16, only tested case data has been reported by the Chinese CDC, because new NHC guidelines removed the clinically diagnosed category. Thus, after February 15, there is a gap in the reported case data that we used up to February 15. The uncertainty of the number of unreported cases for this epidemic includes this gap, but goes even further to include additional unreported cases. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . We assumed previously that the fraction f of reported cases was f = 0.8 and the fraction of unreported cases was 1 − f = 0.2. Our model formulation can be applied with varying values for the fraction f . In Figure 7 , we provide illustrations with the fraction f = 0.4 ( Figure 7A ) and f = 0.6 ( Figure 7B ). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. In Figure 9 , we illustrate the importance of the level of government imposed public restrictions by altering the value of µ in formula (4.1). All other parameters and initial conditions are the same as in Figure 4 . In Figure 9A we set µ = 0.0, corresponding to no restrictions. The final size of cumulative reported cases after 100 days is approximately 1,080,000,000 cases, approximately 270,000,000 unreported cases, and approximately 1,350,000,000 total cases. The turning point is approximately day 65 = March 6. In Figure 9B we set µ = 0.17, corresponding to a higher level of restrictions than in Figure 4 . The final size of cumulative reported cases after 70 days is approximately 45,300 cases, approximately 11,300 unreported cases, and approximately 56,600 total cases. The turning point is approximately day 38 = February 7. The level and timing of government restrictions on social distancing is very important in controlling the epidemic. We have developed a model of the COVID-19 epidemic in China that incorporates key features of this epidemic: (1) the importance of the timing and magnitude of the implementation of major government public restrictions designed to mitigate the severity of the epidemic; (2) the importance of both reported and unreported cases in interpreting the number of reported cases; and (3) the importance of asymptomatic infectious cases in the disease transmission. In our model formulation, we divide infectious individuals into asymptomatic and symptomatic infectious individuals. The symptomatic infectious phase is also divided into reported and unreported cases. Our model formulation is based 8 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . on our work [5] , in which we developed a method to estimate epidemic parameters at an early stage of an epidemic, when the number of cumulative cases grows exponentially. The general method in [5] , was applied to the COVID-19 epidemic in Wuhan, China, to identify the constant transmission rate corresponding to the early exponential growth phase. In this work, we use the constant transmission rate in the early exponential growth phase of the COVID-19 epidemic identified in [5] . We model the effects of the major government imposed public restrictions in China, beginning on January 23, as a time-dependent exponentially decaying transmission rate after January 24. With this time dependent exponentially decreasing transmission rate, we are able to fit with increasing accuracy, our model simulations to the Chinese CDC reported case data for all of China, forward in time from February 15, 2020. Our model demonstrates the effects of implementing major government public policy measures. By varying the date of the implementation of these measures in our model, we show that had implementation occurred one week earlier, then a significant reduction in the total number of cases would have resulted. We show that if these measures had occurred one week later, then a significant increase in the total number of cases would have occurred. We also show that if these measures had been less restrictive on public movement, then a significant increase in the total size of the epidemic would have occurred. It is evident, that control of a COVID-19 epidemic is very dependent on an early implementation and a high level of restrictions on public functions. We varied the fraction 1 − f of unreported cases involved in the transmission dynamics. We showed that if this fraction is higher, then a significant increase in the number of total cases results. If it is lower, then a significant reduction occurs. It is evident, that control of a COVID-19 epidemic is very dependent on identifying and isolating symptomatic unreported infectious cases. We also decreased the parameter ν (the reciprocal of the average period of asymptomatic infectiousness), and showed that the total number of cases in smaller. It is also possible to decrease η (the reciprocal of the average period of unreported symptomatic infectiousness), to obtain a similar result. It is evident that understanding of these periods of infectiousness is important in understanding the total number of epidemic cases. Our model was specified to the COVID-19 outbreak in China, but it is applicable to any outbreak location for a COVID-19 epidemic. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 13, 2020. . https://doi.org/10.1101/2020.03.11.20034314 doi: medRxiv preprint Identifying the number of unreported cases in SIR epidemic models Parameter identification in epidemic models Parameter estimation in epidemic models: simplified formulas The continuing 2019-nCoV epidemic threat of novel corona viruses to global health -The latest 2019 novel corona virus outbreak in Wuhan, China Understanding unreported cases in the 2019-nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions (biology) March The parameter identification problem for SIR epidemic models: Identifying Unreported Cases Initial cluster of novel coronavirus (2019-nCoV) infections in Wuhan, China Is consistent with substantial human-to-human transmission The Rate of Under ascertainment of Novel Coronavirus (2019-nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights Real-time forecasts of the COVID-19 epidemic in China from IDM editorial statement on the 2019-nCoV An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov) Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions Novel coronavirus outbreak in Wuhan, China, 2020: Intense surveillance Is vital for preventing sustained transmission in new locations Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Estimating the unreported number of novel Coronavirus (2019-nCoV) cases in China in the first half of January 2020: A data-driven modelling analysis of the early outbreak