key: cord-1022768-m9jnxoe0 authors: You, Chong; Gai, Xin; Zhang, Yuan; Zhou, Xiaohua title: Determining the Covertness of COVID-19 — Wuhan, China, 2020 date: 2021-02-19 journal: China CDC Wkly DOI: 10.46234/ccdcw2021.048 sha: 0bb4c59a71c9c0dcc817ded32a9b54855694aec9 doc_id: 1022768 cord_uid: m9jnxoe0 INTRODUCTION: The coronavirus disease 2019 (COVID-19) pandemic has been going on for over a year and has reemerged in several regions. Therefore, understanding the covertness of COVID-19 is critical to more precisely estimating the pandemic size, especially the population of hidden carriers (those with very mild or no symptoms). METHODS: A stochastic dynamic model was proposed to capture the transmission mechanism of COVID-19 and to depict the covertness of COVID-19. The proposed model captured unique features of COVID-19, changes in the diagnosis criteria, and escalating containment measures. RESULTS: The model estimated that, for the epidemic in Wuhan, 79.8% (76.7%–82.7%) of the spread was caused by hidden carriers. The overall lab-confirmation rate in Wuhan up until March 8, 2020 was 0.17 (0.15–0.19). The diagnostic rate among patients with significant symptoms went up to 0.82 on March 8, 2020 from 0.43 on January 1, 2020 with escalating containment measures and nationwide medical supports. The probability of resurgence could be as high as 0.72 if containment measures were lifted after zero new reported (lab-confirmed or clinically confirmed) cases in a consecutive period of 14 days. This probability went down to 0.18 and 0.01 for measures lifted after 30 and 60 days, respectively. DISCUSSION: Consistent with the cases detected in Wuhan in mid-May, 2020, this study suggests that much of the COVID-19 pandemic is underreported and highly covert, which suggests that strict measures must be enforced continuously to contain the spread of the pandemic. The outbreak of coronavirus disease 2019 (COVID- 19) has been going on for over a year and has been deemed as a once-in-a-century health crisis (1) . Many countries that believed they had gone through the worst are now again grappling with new outbreaks (1) . A major driving force in the persistence of COVID-19 is transmission caused by hidden carriers with very mild or no symptoms who are unaware of their infection (2) . Due to the covertness of COVID-19 and overburdened medical resources, low diagnostic rates have been observed globally. The number of infected cases estimated by seroprevalence in various regions of the US were 6 to 24 times higher than the reported number (3) . Therefore, understanding the covertness of COVID-19 is critical to getting a more precise picture of the pandemic size, especially the population of hidden carriers, and accordingly, making public health decisions such as the timing to lift containment measures. Previous studies have unraveled the covert features of COVID-19, including incubation period, proportion and transmissibility of asymptomatic infections (4) (5) (6) (7) , and overall reporting rates (7) (8) . Hao et al. (2020) provided an important perspective on the transmission dynamics of COVID-19 in Wuhan. The key finding inferred that 87% of the infections before March 8, 2020 were not lab-confirmed (note that a lab-confirmed case is defined as a case who is symptomatic and tested positive for COVID-19; clinically-confirmed cases and detected asymptomatic cases were excluded) (8) . However, due to limitations such as violation of the homogeneity assumption in compartment dynamic models, flawed inference of transmissibility parameters, and the inability to distinguish the covert nature of COVID-19 from external factors (i.e., overburdened medical recourses), the study failed to capture some important information of covertness which led to an overoptimistic estimate on the all-clearance date in Wuhan (see Supplymentary Material Appendix A for details, available in http://weekly.chinacdc.cn/). In this paper, inspired by the SAPHIRE model in Hao et al. (2020) , an improved dynamic model was proposed to depict the covertness of COVID-19 to gain a better understanding of the diagnostic rate and the probability of resurgence under different policies. The proposed model included 6 compartments: susceptible ( ), exposed ( ), presymptomatic infectious ( ), infectious with significant symptoms ( ), infectious without significant symptoms ( ), and removed ( ). The individuals in had significant symptoms that would be diagnosable but might not be lab-confirmed. For example, when the medical system was seriously overburdened, even a severe symptomatic case might not be diagnosed or hospitalized promptly before the viral load had dropped to the level that the infection could no longer be detected by tests. Meanwhile, compartment included hidden carriers who were unlikely to be detected. Therefore, compared to the original SAPHIRE model, individuals in or would now show better homogeneity, which was a required underlying assumption in compartmental dynamic models. Individuals in could only transit to by losing transmissibility pathologically, while patients in might reach either by losing their transmissibility pathologically or by isolation upon lab-confirmation or clinical diagnosis. The transmission rate was set to vary over different time periods based on the local social events, medical resources, and implementation of containment measures. The covertness of COVID-19 can be better understood by dividing the diagnostic rate (hereafter referred to as lab-confirmation rate) into the following two components: 1) the natural characteristics of COVID-19 on covertness, i.e. the fraction of those infectious virus carriers without significant symptoms throughout their course of disease; 2) factors related to medical resources and containment measures, which could be replenished and help improve the diagnostic rate in the later stages of the outbreak. In the proposed model, these components were estimated separately. In addition, in reality, transmissibility decreased towards the end of the infectious period (9), hence, the assumption of a constant transmission rate throughout compartment and might potentially lead to an overestimation of effective reproduction number in the early stages. The proposed model had also taken these issues into consideration (see Appendix B for the solution details and ODE system description, available in http:// weekly.chinacdc.cn/). The same data from January 1 to February 29 from Wuhan in Hao et al. (2020) were used here for comparison. A total of 5 time periods were identified as in Hao et al. (2020) in Wuhan for the transmission rate to vary. Based on the stochastic dynamic model, the effective reproduction in Wuhan was 4.49 (4.01-5.00) and 4.10 (3.71-4.52), respectively, in the first 2 periods of January 1-9 (before Chunyun, the period of intense travel preceding the Spring Festival) and January 10-22 (Chunyun), then dropped dramatically to 1.00 (0.92-1.07), 0.43 (0.40-0.46), and 0.27 (0.22-0.33) in the later 3 periods ( Figure 1B ). These implied that the pandemic had been preliminarily controlled since the third period. It was estimated that 79.8% (76.7%-82.7%) of the spread of the disease was caused by hidden carriers (namely population in and ). The estimated cumulative number of infections up until March 8 was 194, 302 (170, 691 ) and the overall lab-confirmation rate was 0.17 (0.15-0.19), which was of a similar order as in Hao et al (2020) . I However, such a low lab-confirmation rate was mostly caused by the high proportion of asymptomatic or very mild cases which was estimated to be 0.74 (0.71-0.77), while the diagnostic rate among patients with significant symptoms (namely patients in ) went up to 0.82 on March 8 from 0.43 on January 1 because of the escalating containment measures and nationwide medical support ( Figure 1C ). This estimation indicated that most of those with significant symptoms in Wuhan would be promptly diagnosed in the later stage of the outbreak. The clearance of all active infections would occur on June 3 (May 15 to July 5) assuming the trend remained unchanged as in the fifth period. These estimates were significantly more consistent with the confirmed cases reported in Wuhan in May, and the hidden carriers detected in subsequent city-wide testing (10). In regard to the decision making of continuous surveillance and interventions, the model found that if all measures were lifted after zero new reported (lab-confirmed or clinically-confirmed) cases in a consecutive period of 14 days, the probability of resurgence was still as high as 0.72 ( Figure 1D ). This probability went down to 0.18 and 0.01 if measures were lifted after zero new confirmed case in a consecutive period of 30 and 60 days, respectively. See Appendix C for more details, available in http://weekly.chinacdc.cn/. In this study, the model estimation is consistent with the number of cases detected in Wuhan in mid-May. In comparison with reproduction numbers in other published studies, the estimate in the first period is in the range but on the higher side (11-13). It is possibly due to most of the other published reproduction numbers being estimated for the period after first period in this study (January 9) and that earlier data were not as complete, which might lead to an overestimation in reproduction number. Furthermore, the results suggested that COVID-19 was highly covert, the spread of the disease in Wuhan was mostly caused by hidden carriers, and the probability of resurgence was high even if the measures were retained for 14 consecutive days after reaching 0 new reported cases, which may explain the resurgence in new infections over the past few months in other countries. As a result, continuous and sometimes even painstaking endeavors have to be made in order to contain the spread of the pandemic. Large-scale testing is encouraged towards the end of a significant outbreak to identify and quarantine hidden carriers before a city or nation can be safely reopened. In particular, the model implies that it takes more than 4 months in Wuhan from a strict lockdown to the final clearance of all active infections. The model in this study can be extended to fit pandemic data outside Wuhan, though modifications are needed with respect to specific countries or regions. For example, in the United States, statistics on daily reported cases are publicly available, but 1) no distinction is made between symptomatic and asymptomatic cases, 2) the dates of symptoms onset are mostly unavailable, 3) most reported cases are only required to self-isolate, which means that even if a case is reported, there is still a chance of infecting others. Moreover, we also need to track the changes of nonpharmaceutical interventions within the region of interest, to which (14) can be a helpful reference. To accommodate such differences, an additional compartment is needed for reported cases. To conclude, the proposed model reflects the unique features of COVID-19, the changes in the diagnostic criteria, and the escalating containment measures, and hence the corresponding model estimates offer a better understanding of the diagnostic rate and the probability of resurgence under different policies. COVID-19 is highly covert, 74% (71%-77%) of the virus carriers had no/mild symptoms, 80% (77%-83%) of the spread of the disease was caused by those hidden carriers, and as a result, the probability of resurgence is high. This study shares some limitations in Hao et al. (2020), for example, the assumption of homogeneous transmission rate within the population while ignoring heterogeneity between groups by sex, age, geographical region, socioeconomic status. Moreover, the population movement in this study is modelled under the same relatively simple setting as in Hao et al. (2020) . This is acceptable for the case of Wuhan due to the travel restriction since January 23. More sophisticated modelling on travel flows is needed in order to generalize this model to other regions. Finally, the recently reported SARS-CoV-2 mutants may pose potential challenge to the generalization of the proposed model. Even when control measures remain unchanged, the emergence and spread of new COVID-19 variants may change the transmission rate and thus make the epidemic trend deviate from our prediction. These will be explored in future research when more relevant epidemiological data are available. Conflicts of interest: The Authors declare that there is no conflict of interest. 14. The SAPHIRE model included 7 compartments susceptible ( ), exposed ( ), presymptomatic infectious ( ), ascertained infectious ( ), unascertained infectious ( ), isolation in hospital ( ), and removed ( ). For ease of understanding, was suggested understanding as the loss of transmissibility pathologically to be distinguished from . An individual in would be infected by individuals in , , or with different transmissibility to get into and then after a latent period. At the time point of symptoms onset, an individual transitioning from to or A depending on whether they would be lab-confirmed in the future, and the ascertainment rate ( ) is the ratio that a patient would be lab-confirmed. For a case to be lab-confirmed, the patient must be both symptomatic and testing positive, which means individuals in must be symptomatic, while those who were in could be asymptomatic and their symptoms onset stage was just a hypothetical one which was included in the model for simplicity. The individuals in would then lose their transmissibility pathologically and transitioned into . Meanwhile, individuals in would either lose their transmissibility pathologically ( ) before they got confirmed and isolated in hospital, which implies that a patient can be no longer infectious but still test positive or become isolated in hospital ( , lost their transmissibility physically) and eventually then lost their transmissibility pathologically ( ). The parameters and transmission rate ( ) vary across five time periods based on key events (e.g., Chunyun, the period of intense travel preceding the Spring Festival) and containment interventions. Different from most of other dynamic models fitting number of confirmed diagnosis at time , the numbers of individuals in all compartment in this model were not directly observable except in where is the number of lab-confirmed cases who reported their date of symptoms onset was on time . In consideration of the SAPHIRE model described as above, the following four aspects can be potentially improved: r r r 1) The initial ascertainment rate was estimated based on the assumption of perfect ascertainments in Singapore ignoring asymptomatic individuals which certainly gave an over-conservative estimate of under the current model as mentioned in Hao et al. (2020) . In addition, should be a continuous function rather than a step function over the five time periods, see the justification in Appendix D. 2) The individuals in could be very different including asymptomatic, mild cases, and severe cases as evidence by deaths of clinically confirmed cases reported in (2), it is hence not optimal to assign a same transmission rate to all individuals in (note that the proposed transmission rate in was identical to that of the presymptomatic infectious period and was of that in ). At the beginning of the pandemic, the medical resources were overburdened. It was likely to have a larger fraction of patients with severe symptoms in and thus the transmission rate would be close to that of . When medical resources were replenished and strong screening and public awareness campaign were implemented, the remaining unascertained cases should mostly be asymptomatic or mildly symptomatic, and the transmission rate would be closer to that of . See why this issue can not be easily resolved in Appendix D. 3) Though only the data of lab-confirmed cases were used in the SAPHIRE model, the isolation due to clinical diagnosis cannot be simply ignored in the model. As mentioned in Hao et al. (2020) the clinically diagnosed cases were excluded in the model. However, there were a significant number of cases in of the SAPHIRE model who were not lab-confirmed but clinically confirmed and isolated in fangcang (square-cabin) hospitals in Wuhan during February 2020 and thus lost their transmissibility before they actually got into , which implies that clinicallyconfirmed cases in would have a faster rate to get into than other cases in (3) . Days is questionable. The symptomatic infectious period was the mean time from symptom onset to loss of transmissibility pathologically, and the value was calculated based on the claim that 44% of secondary cases were infected during the index cases' presymptomatic stage by He et al. (2020) (4) . Regardless of whether such claim is correct (a matter arising paper to He et al. (2020) was published), this 44% of pre-symptomatic spread was estimated based on the confirmed cases with isolation measures outside Wuhan, which is certainly not appropriate to be used to estimate mean time from symptom onset to loss of transmissibility pathologically. Furthermore, another defect in the calculation of is the inconsistency in the study of Hao et al. (2020) , where a constant infectiousness was assumed across the presymptomatic and symptomatic phases of ascertained cases in estimating while in the meantime was used as the ratio of transmission rate of cases in (presymptomatic) to that of in (symptomatic). It is important to note that unlike other pre-determined parameters in the model, the value of is quite crucial to the model estimates of interest, see Supplementary Table S1 in Appendix D for detail. Hence a more accurate choice of is essential. It is worth to note that the clearance of all active infections in Hao et al. (2020) was predicted to occur on April 21 (April 8 to May 12), which was not consistent with the active cases detected in Wuhan in mid-May. Inspired the SAPHIRE model in Hao et al. (2020) , an improved dynamic model was proposed. The proposed model included 6 compartments: susceptible ( ), exposed ( ), presymptomatic infectious ( ), infectious with significant symptoms ( ), infectious without significant symptom ( ), and removed ( ). Note that patients in could only transit to by losing transmissibility pathologically while patients in may reach by either losing their transmissibility pathologically, isolating upon lab-confirmation, or isolating upon clinical diagnosis. Therefore, the transition rate from to is given by , where and are the period of the symptomatic infectious period and duration from illness onset to lab-confirmation; and is the duration from illness onset to clinical diagnosis which varies at different time period (here let equals to infinity and 10 days before and after February 2 in Wuhan). Furthermore, it has been found that the transmissibility decays towards the end of infectious period, hence the assumption of a constant transmission rate throughout compartment and might potentially lead to an overestimate of the effective reproduction number in the early stages. A preliminary solution to this problem is to split into (early stage of symptomatic infectious period with a higher transmissibility) and (late stage of symptomatic infectious period with a low transmissibility) and split into (early stage of asymptomatic/mild infectious period with a higher transmissibility) and (late stage of asymptomatic/mild infectious period with a low transmissibility). The transition dynamics of these states are as follows: (1) Let and with a rate of which corresponds to the setting in the SAPHIRE model, while and with a rate of . Thus, the expectation of the symptomatic infectious period would be days which is consistent with the choice of days. (2) Individuals in have a transmission rate of and individuals in have a transmission rate of . where is the unknown transmission rate for significantly symptomatic cases in early stages and varies different time periods (set the five time periods as in Hao et al. (2020) for the demonstration in Wuhan); is the ratio of the transmission rate of presymptomatic/asymptomatic/mild cases to that of significantly symptomatic cases and is prefixed; is the unknown fraction of infections with significant symptoms; and are defined as before, set by assuming the trends of viral load are independent to symptoms; , , , and are the latent period, presymptomatic infectious period, duration from illness onset to isolation and duration from illness onset to clinical diagnosis respectively and are all predetermined. Under such setting, the transmission rate for is reasonable to be a constant over time and equal to the one for . In addition, the lab-confirmation rate can be better presented as the function of the ratio between cases with insignificant (no/mild) and significant symptoms, and the time dependent ratio between the isolation/diagnosis and removal speed. Appendix E and F are for estimation method, choices of initial values, parameter settings, and sensitive analysis for the modified model. The similar estimation method in Hao et al. (2020) is used in this study. Note that all CIs without further specifications are 95% CIs throughout this paper. The estimated cumulative number of infections up until March 8 was 194, 302 (170, 691) by fitting data from all 5 periods, this number increased to 198,748 (173,856-226,051) if the trend of the fourth period was assumed, 355,907 (301,525-418,579) if the trend of the third period was assumed or 11,507,840 (10,688,446-12,336,104) if the trend of the second period was assumed, see Supplementary Figure S2A -C for details. These represented a 2.24%, 44.16%, and 96.91% reduction of infections by the measures taken in the fifth period, the fourth and the fifth periods combined, and the last three periods combined, respectively. Note that under the trend of second period the total number of infections exceeded the total population of Wuhan. It was because the population inflow and outflow in Wuhan was about 800,000 per day before lockdown, the estimated number of infections could therefore be regarded as the number of infection in/from Wuhan. The number of daily active infections (including cases in P, I, and A) peaked at 75,093 (64,342-87,068) on February 1 and dropped to 6 (1-13) on May 14 (Supplementary Figure S2D) . If the trend remained unchanged as in the fifth period, the number of significantly symptomatic infections ( ) would first become zero on April 12 (April 3 to April 23), and the clearance of all infections (namely ) would occur on June 4 (June 15 to July 5). Compared with the estimates in Hao et al. (2020) , the estimate on was much more heavily tailed implying the high covertness of COVID-19. Considering a few cases detected in Wuhan in mid-May, the estimate on the clearance of all active infections was more consistent with the official report in Wuhan than what was predicted in Hao et al. (2020) (5) (6) . Regarding continuous surveillance and interventions, based on the modified model that if control measures were lifted after zero new reported case in a consecutive period of 14 days, the probability of resurgence, defined as the number of active significantly symptomatic cases greater than 100, was still as high as 0.72, and the surge was predicted to occur on Day 41 (31-57) after lifting controls. Compared with the results in Hao et al. (2020) , estimates on the probability of resurgence were much higher, which suggests continuous efforts in interventions is essential to contain the spread of the pandemic. The mean transmissibility in defined by Hao et al. (2020) should be a time/configuration dependent function , which were determined by averaging the transmissibility between patient with symptoms and those with no/mild symptom in Unfortunately, these two sub-populations of were unidentifiable in the proposed model structure by Hao et al. (2020) . One relatively easy potential fix of the issue was to replace with and allow changing over different time steps in the paper. However, according to the definition of , patients that did not get diagnosed could only increase through symptom onset and decrease through the loss of infectivity. Thus, the evolution of the sub-population proportion within must be continuous in time. Meanwhile, the changes in containment measures/lab-confirmed rates in different steps in this model could only lead to changes to the evolution rate but not the proportion itself. This fact made it inappropriate to treat as a step function. By similar argument, the setting of in Hao et al. (2020) as a step function is also questionable. WHO. Virtual press conference on COVID-19 in the western Pacific Seroprevalence of antibodies to SARS-CoV-2 in 10 Sites in the United States Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing Reconstruction of the full transmission dynamics of COVID-19 in Wuhan Chinese) 000 new "clinical diagnosed cases" confirmed in a day in Hubei Temporal dynamics in viral shedding and transmissibility of COVID-19 Wuhan ended its 35 days of zero new cases City-wide screening results (5 days) released in Wuhan, 58 new asymptomatic cases detected Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing Estimation of the asymptomatic ratio of novel coronavirus infections Prediction of the COVID-19 outbreak in China based on a new stochastic dynamic model A major obstacle in parameter inference of the modified model is that now no compartment is directly observable. To be precise, is the inflow of on day satisfying a Poisson distribution with , while the observed data only consisted of those who had symptoms onset on day provided that they would be diagnosed in the future, namely, a sub-population in . Therefore, the distribution of such sub-population was estimated under this model. Fortunately, by the thinning argument of a Poisson process, the size of the sub-population of interest satisfies Poisson distribution with , where is the probability that a patient with symptom onset on day would be diagnosed in the future. Moreover given , the duration from symptom onset to the time point that test turns to negative, and are predetermined in Hao el al (2020), the exact value of can be independently calculated using the following stochastic viewpoint of the dynamic model: consider a time-dependent Poisson Process/Poisson Point Measure, with intensity equals to on different stages, and a time homogeneous Poisson Process with intensity independent to . For each , define stopping timesbe the first jumps times after . Then can be calculated can the probability that . The specific values of can either be calculated manually for each or can be approximated numerically using frequencies obtained from multiple independent stochastic simulations. Since the precision of numerical simulation can be guaranteed by law of large numbers and large deviation theory, it was used here to approximate the values of 's with stochastic realizations for each . Supplementary Table S2 provides a list of parameter settings in the modified model for all five periods. The initial values of were estimated as follows:(1) Let the number of symptoms onset cases during December 29 to 31 who would be lab-confirmed in the future;be the initial lab-confirmed rate in Wuhan among symptomatic case which was calculated based on assuming complete diagnosis of early cases among symptomatic cases in Singapore; and be the proportion of symptomatic patients (1, (7) (8) (9) .The initial population of symptoms onset patients in Wuhan namely were hence given by .The ratio between and , it should be roughly the same as the unknown diagnosable ratio in the ODE system of the modified model. By "fix point iteration" method, was estimated to be 0.26, hence and . In the original paper, and stood for the numbers of lab-confirmed cases with onset during January 12,