key: cord-0934046-koss2j0j authors: Zhang, Hongzhe; Zhao, Xiaohang; Yin, Kexin; Yan, Yiren; Qian, Wei; Chen, Bintong; Fang, Xiao title: Dynamic estimation of epidemiological parameters of COVID-19 outbreak and effects of interventions on its spread date: 2021-03-10 journal: J Public Health Res DOI: 10.4081/jphr.2021.1906 sha: 1c888a2137235cfeb16b92cae8130ca1fc9699be doc_id: 934046 cord_uid: koss2j0j Background: A key challenge in estimating epidemiological parameters for a pandemic such as the initial COVID-19 outbreak in Wuhan is the discrepancy between the officially reported number of infections and the true number of infections. A common approach to tackling the challenge is to use the number of infections exported from the originating city to infer the true number. This approach can only provide a static estimate of the epidemiological parameters before city lockdown because there are almost no exported cases thereafter. Methods: We propose a Bayesian estimation method that dynamically estimates the epidemiological parameters by recovering true numbers of infections from day-to-day official numbers. To illustrate the use of this method, we provide a comprehensive retrospection on how the COVID-19 had progressed in Wuhan from January 19 to March 5, 2020. Particularly, we estimate that the outbreak sizes by January 23 and March 5 were 11,239 [95% CI 4,794–22,372] and 124,506 [95% CI 69,526–265,113], respectively. Results: The effective reproduction number attained its maximum on January 24 (3.42 [95% CI 3.34–3.50]) and became less than 1 from February 7 (0.76 [95% CI 0.65–0.92]). We also estimate the effects of two major government interventions on the spread of COVID-19 in Wuhan. Conclusions: This case study by our proposed method affirms the believed importance and effectiveness of imposing tight nonessential travel restrictions and affirm the importance and effectiveness of government interventions (e.g., transportation suspension and large scale hospitalization) for effective mitigation of COVID-19 community spread. A novel coronavirus has quickly spread across the world since December 2019. 1 To combat this global public health crisis, an essential early step to contain or slow the outbreak of COVID-19 (i.e., the disease caused by the novel coronavirus) is to uncover its epidemiological parameters over time so that we can analyze the effect of different interventions on its spread; 2 methodology progress from this perspective also has important impact and is generally applicable in guiding public health response for future epidemic events beyond COVID-19. Toward that end, a number of studies have attempted to estimate its epidemiological parameters such as the number of infected cases and the reproduction number. [1] [2] [3] [4] [5] [6] [7] [8] A key challenge for these studies is that the officially reported number of infections (hereafter referred to as the official number) could be much lower than the true number of infections. In this paper, we use the early period of the COVID-19 pandemic at the epicenter in China, the city of Wuhan, 9 as our main illustrative case study of such challenge. This under-reporting problem could be attributed to many factors, such as insufficient amount of virus test kits and the shortage of hospital beds. In particular, a common approach to tackling the under-reporting problem is to use the official number of infected cases exported from Wuhan to infer the true number of infections within Wuhan, assuming that, outside the city, the official number is close to the true number. 3, 4, 6 For example, Wu et al. 4 use the number of cases exported from Wuhan inter-nationally to infer the true number of infections in Wuhan whereas Cao et al. 3 employ the official number of cases exported from Wuhan domestically. This approach can only provide a static estimate of the epidemiological parameters before January 23, 2020, because there are almost no exported cases from Wuhan after the Wuhan lockdown effective January 23, 2020. 10 However, the epidemiological parameters of the COVID-19 are dynamic, partly because of various interventions over time. It is therefore imperative to estimate the epidemiological parameters of the COVID-19 outbreak dynamically and beyond January 23, 2020. We solve the under-reporting problem from a distinctive perspective. Rather than relying on cases exported from Wuhan, we propose a method to dynamically estimate the epidemiological In fighting global pandemic such as COVID-19, an important early task for understanding the spread is to closely monitor the infection size and assess the disease epidemiological parameters. The in-sights gained from the epidemiological parameter estimation enable public health practitioners to dynamically monitor the temporal spread trend and to quantitatively analyze the effectiveness of new public health policies. In this paper, we aim to address a key technical challenge potentially arising from the under-reporting issues in pandemic early periods, and critically re-examine the COVID-19 situation at the initial epicenter Wuhan city as a practically relevant case study. Methodological development for modeling dynamic evolution involving parameter estimation therefore has important public health applications and is expected to have significant impact on modeling practice for understanding future epidemic events well beyond parameters of the COVID-19 outbreak in Wuhan over time by transforming day-do-day official numbers of infections. Specifically, we propose a general Bayesian estimation method that seamlessly integrates an epidemic model characterizing the spread mechanism of the disease and a salient transformation approach, coupled with prior knowledge on key parameters of the epidemic model. Our proposed method has the following distinguishing features compared to existing methods. First, we tackle the under-reporting problem by proposing a straightforward yet effective transformation approach to adjust for potential discrepancies between official and true numbers to give better overall picture for the scope of the COVID-19 outbreak, thereby more reliably quantifying its key epidemiological parameters. Second, our approach conveniently incorporates the fast evolving knowledge from new COVID-19 literature to generate well-justified and more refined parameter estimation results with uncertainty quantification. Furthermore, the temporal dynamic estimation over time keeps track of the evolving disease spread in response to interventions and holds the promise of objectively monitoring and evaluating effectiveness of various containment measures. Our retrospective analysis uncovers and demonstrates the evolution of the COVID-19 outbreak in Wuhan from January 19, 2020 to March 5, 2020. In particular, for every day in this period, we apply the proposed method to estimate the effective reproduction number as well as true numbers of infections, such as the cumulative number of infected cases and the number of actively infected but not quarantined cases. Our proposed method also produces daily underreporting factors, which indicate the degree of discrepancies between official and true numbers. Finally, using the dynamic epidemiological parameters estimated by our analysis, we evaluate the effects of two major interventions on the spread of COVID-19 in Wuhan. We obtained data about the COVID-19 outbreak in Wuhan from official reports released by the Chinese Center for Disease Control and Prevention (CCDC) between January 18, 2020 and March 5, 2020. CCDC provides daily cumulative number of infected cases and removed cases (i.e., recovery and death). Let Ct o denote the cumulative number of infected cases by day t and Rt o be the cumulative number of removed cases by day , both officially released by CCDC. Assuming that all the officially confirmed infections have been effectively quarantined (e.g., hospitalized), we have where Qt o is the official number of actively infected and quarantined cases by day t. It is worth noting that daily number of newly infected cases dramatically increased to 13,436 on February 12, 2020 from 1,104 the day before, according to CCDC. This surge was attributed to the change of government criteria for confirming infections. Before February 12, 2020, only those tested positives by test kits were considered as infected. Starting from February 12, 2020, an infection was confirmed either based on positive testing result or through clinical diagnosis using computed tomography (CT) scans. As a result, suspected infections by CT scans before February 12, 2020 were relabeled as confirmed infections on February 12, 2020. It is therefore necessary to adjust the number of newly infected cases on February 12, 2020 (i.e., 13,436) by reallo-cating this number to days prior to and including February 12, 2020, proportional to the number of daily suspected cases in these days. Our analysis uses only publicly available data for secondary data analysis that involves neither human subjects research nor making data individually identifiable. We assume that the diffusion of COVID-19 in Wuhan follows an epidemic model whose underlying time-dependent state variable =(St, It, Qt, Rt) are from a dynamic system with system parameters ΘH = (β,μ,γ). These state variables and system parameters are summarized in Table 1 ; their meanings and the epidemic model will be elaborated in the next subsection. In particular, Qt represents the number of actively infected and quarantined cases by day t and Rt represents the cumulative number of removed cases by day t. Ideally, we can obtain data about actual diffusion of COVID-19 over time. That is, ideally, we can have stochastically realized true values of Qt and Rt for t=1,2,3,⋯, denoted as Qt e and Rt e . In general, if the realized true values of all state variables were known, we could estimate system parameters ΘH using well-developed statistical methods (e.g., [11] [12] [13] from frequentist perspectives). In reality, we only observe a subset of state variables with their officially reported numbers Qt o and Rt o . Due to the under-reporting problem, these official numbers, Qt o and Rt o could be much lower than Qt e and Rt e , respectively. As a result, directly applying an existing method to Qt o and Rt o may not generate or reliably uncover the epidemiological parameters of COVID-19. To address this issue, we propose transformation functions that aim to recover Qt e and Rt e from observed Qt o and Rt o with some (unknown) transformation parameters Θf. With the aforementioned framework, we need to estimate parameters ΘH and Θf. Instead of using the frequentist approaches (such as maximum likelihood estimation or MLE), we develop a Bayesian approach for our problem because of the following considerations. First, the Bayesian approach allows us to incorporate existing knowledge on COVID-19 to give a guided estimation of ΘH through well-informed prior selection, while the MLE approach would have to largely ignore the valuable information from prior literature. Second, the posterior distribution, given our proposed modeling strategy and prior, has clear interpretation and can provide straightforward uncertainty quantification. To our knowledge, the MLE approach for our specified model settings has no well-developed inference theory for the estimators. Third, from a practical perspective, our Bayesian sampling scheme (described in the subsection of Parameter Estimation) for the posterior distributions is straightforward to derive and implement, while the MLE estimator is more computationally involved and difficult to obtain. For explicit overview summary, we include all the essential components of our Bayesian modeling scheme for an epidemic model with transformation functions proposed above in Figure 1 , whose technical details will be described in the following subsections. Recent evidence have shown that non-symptomatic infected cases and infected cases in their latent period can spread COVID-19 with high efficiency, e.g., Chang et al. 14 In alignment with these findings, we adopt a Susceptible-Infective-Quarantined-Removed (SIQR) compart-mental model to characterize the diffusion of COVID-19. 15 The susceptible compartment of the model consists of those who can be infected. The infective compartment is composed of those who are actively infected but not quarantined, with or without symptoms. Those who are actively infected and quarantined are in the quarantined compartment. The removed compartment consists of those who recover or die from the disease. The state variables of the epidemic model, St, It, Qt, Rt, are defined in Table 1 , and the population size N = St + It + Qt + Rt. 16 The SIQR model is defined using the following ordinary differential equations (ODE): In these ODEs, b is the adequate contact rate, where adequate contacts refer to contacts sufficient for transmission; 17 m is the rate at which an infected case gets quarantined, and g is the rate at which a quarantined case becomes removed. In the SIQR model, the effective reproduction number R and the cumulative number Mt of infected cases by day t are given by 15, 18 (eq.3) (eq.4) Let ΔQ t e = Q t e -Q t e -1 be the true daily increased number of infected and quarantined cases at day t. Similarly, let ΔQt o =Qt o -Q t o -1 be the officially reported daily increased number of infected and quarantined cases at day t, i.e., the official counterpart of ΔQt e . Due to the underreporting problem, ΔQt o tends to be smaller than ΔQ t e . Assuming that the daily increased number of infected and quarantined cases is underreported in a consistent manner within a short time window, we model the relationship between ΔQt o and ΔQ t e as ΔQt o = aΔQ t e (eq.5) where 0