key: cord-0459530-qaschaga authors: Lai, Shiyang; Zhao, Tianqi; Fan, Ningyuan title: Inferring incubation period distribution of COVID-19 based on SEAIR Model date: 2020-07-22 journal: nan DOI: nan sha: b63f43c368f68f4b3bed56cc92edc8e5d5d6d97c doc_id: 459530 cord_uid: qaschaga To reduce the biases of traditional survey-based methods, this paper proposes an epidemic model-based approach to inference the incubation period distribution of COVID-19 utilizing the publicly reported confirmed case number. We construct an epidemic model, namely SEAIR, and take advantage of the dynamic transmission process depicted by SEAIR to estimate the onset probability in each day of exposed individuals in eight impacted countries. Based on these estimations, the general incubation probability distribution of COVID-19 has been revealed. The proposed method can avoid several biases of traditional survey-based methods. However, due to the mathematical-model-based nature of this method, the inference results are somewhat sensitive to the setting of parameters. Therefore, this method should be practiced reasonably on the basis of a certain understanding of the studied epidemic. The current outbreak of Coronavirus disease 2019 has developed into a global pandemic and leads to a global crisis. So far, however, the knowledge of COVID-19 is limited, especially its incubation period distribution. The incubation period of an infectious disease refers to the time interval between infection and onset of symptoms. Understanding the incubation period is not only helpful for disease control and surveillance, but also essential to the study of mechanisms of disease transmission. For instance, the optimal quarantine period can be determined according to the distribution of incubation period. Although the incubation period is of great importance, it is not easy to obtain the accurate distribution of incubation period under data limitation. The existing literature regarding the estimation of incubation period of COVID-19 mainly relies on sampling and surveys from the infected people. For example, Guan et al., (2020) made a summary statistics of incubation period based on the 291 patients who had claimed a clear specific exposure date. Their results stated that the median incubation period is four days. Backer et al., (2020) estimated the distribution of incubation period using 88 samples that had the travel history from Wuhan, and showed that the Weibull distribution fits the data best and the mean incubation period is 6.4 days (95% credible interval (CI): 5.6-7.7). Qin et al., (2020) utilized the renewal process to estimate the distribution of incubation period using 1211 samples who had clear dates of departure from Wuhan and dates of symptoms onset. They suggested that the median incubation period is 8.13 days (95% CI: 7·37-8·91) and mean incubation period is 8.62 (95% CI: 8.02-9.28). However, the above-mentioned methods suffer several drawbacks. First, surveys are always along with errors, such as sampling error, coverage error, and measurement error. The sample size of current research ranges from 88 to 1211. Under the limitation of sample size, it is challenging to make a reliable estimation of incubation period distribution. Second, when surveying, the patients' recall bias is inevitable, which may also lead to erroneous results. Some patients may not know the exact exposure date, or maybe lack of the memory of exposure history. Therefore, the exposure dates may not be accurately monitored and recorded. Last but not least, the judgment of contact trackers will also influence the determination and record of the exposure date, thus influencing the accuracy of the estimation of incubation period. To address the above shortcomings, we draw on the idea of the SEAIR model and propose a novel method to infer the probability distribution of the incubation period of COVID-19 based on the daily confirmed case number. Contrary to the traditional survey method, the proposed method reduces the dependence on survey data. Since the simple SIR model was proposed by William et al. in 1927, a large number of advanced epidemiological models have been generated on this basis to explain different kinds of real word infectious disease transmission (Calatayud et al., 2018; Chinazzi et al., 2020; Mizumoto et al., 2020; Arino & Portet, 2020) . In the specific context of COVID-19, we proposed a SEAIR epidemic model with a time delay that follows a discrete random probability distribution ( Figure 1 ). (A2) There is a damping effect on the virus reproduction rate due to the government's containment effort, and this effect can be modeled through an exponentially decreasing function (Lanteri et al., 2020) . Namely, (A3) Asymptomatic individuals are hard to be recorded, and their number accounts for a certain ratio of all infected people (Mizumoto et al., 2020) . Namely, (A4) Once the infected people develop symptoms, they will be recorded and quarantined just on the day of the onset. Therefore, they will lose the ability to spread the virus after this day. On the contrary, asymptomatic individuals can infect others continually since they are difficult to identify . Namely, Based on the above-mentioned assumptions, we try to describe the dynamic process from group E to group A and I, including a discrete-time delay. The newly added symptomatic and asymptomatic cases' number can be interpreted as the number of all exposed individuals who were infected before day t and become sick on day t. Through this logic, we can define the sum of and as follow: (1) In this equation, is the probability of the exposed individual having a k-day incubation period. Also, because of the assumption A3, we have . Therefore (2) To make this equation calculable, we set a threshold k, which represents the maximum length of the incubation periods that this epidemic could have. In other words, the probability of exposed individuals generated before day falling ill on day t is zero. In a T days long time window where , this problem can be transformed into a problem of solving multivariate linear equations. Given a T length time series, we define , , . Note that , and . We can estimate P through minimizing the least square cost function of the multivariate linear equations presented below: , where . However, due to the high-dimensional characteristic of this objective function, this task is likely to have a lot of local minima which makes it hard for standard optimization methods because there is a strong dependency on the initial condition. We employ the basin-hopping algorithm, a global optimization method introduced by Wales et al. in 1998 to obtain the estimation of P. We retrieve the publicly available data of the COVID-19's confirmed, recovered, and death cases from ministries of health in eight countries from January 21 to April 13. Yang et al. (2020) (Mizumoto et al., 2020) , the estimated asymptomatic proportion among all infected cases is about 17.9%. Thus, we value to 22% (The ratio of asymptomatic patients to symptomatic patients is 1 to 4.56). Besides, since several epidemiological studies have proven that the incubation period of COVID-19 is hardly longer than 20 days (99th percentile) Backer et al., 2020) , so we set the threshold k to 20. We use the historical data of one month after the outbreak of each country to calculate P and use spline to construct the probability distribution of the incubation period. Figure 2 shows the estimated incubation period distribution when is set to 0.1. The subgraph on the left side presents the results of each country and the overall mean value of these countries, and the child plot in the top right corner presents the changing of reproduction number when equals 0.1. The shade part of the subgraph on the right side is the 95% pointwise confidence interval (CI) of the estimated distribution. As we can see in Figure 2 , basically, among all the studied countries, the inference results are similar. In this estimated incubation period probability distribution, the distribution reaches its peak on the fourth day, meaning a person infected with COVID-19 is most likely to have a four-day incubation period. However, in the real-world situation, we cannot identify the 'real' , which indicates the effect of the countries' measures on suppressing the spread of epidemics. Accordingly, we also operate sensitivity analysis to to test the stability of our estimation, and the results have been shown in Figure 3 . With five different values of , our predicted distribution still maintains a similar shape and consistent peak position. But as becomes larger, the distribution becomes more and more smooth. We list the expectation, the median, and the 95th percentile under these five different conditions in table 1. Table 1 here Comparing with several current studies about COVID-19's incubation period Backer et al., 2020; Linton et al., 2020) , our listed inferences' results are quite close to theirs, especially when is equal to 0.01 or 0.05. This phenomenon might imply the fact that, after the outbreak of COVID-19, most countries did not operate efficiently and urgently to suppress the spreading of the virus. In this paper, we introduce an epidemic-model-based approach to inference the distribution of the incubation period of the COVID-19, and our results are in line with the existing studies. The contributions of this paper are twofold. First, based on the transmission process depicted by the established epidemic model, our proposed method infers the distribution of incubation period from the overall daily confirmed cases number reported by each countries' governments. Therefore, our method avoids the drawbacks associated with traditional survey-based methods. Second, distinct with the traditional methods, our approach doesn't need to make prior assumptions about the distribution of the incubation periods. We provide a more actual estimation result that is closer to the "real" situation to some extent. Nevertheless, because the proposed approach is based on the prior knowledge of COVID-19, such as reproduction rate, asymptomatic proportion, government's containment effort, and so on, which is needed to depict the transmission process, our results are sensitive to the setting of parameters. Thus, to make reliable inference using this method, information about the virus provided by prior researches are necessary. A simple model for COVID-19 Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China Computational uncertainty quantification for random time-discrete epidemiological models using adaptive gPC The effect of travel restrictions on the spread of the 2019 novel coronavirus Clinical Characteristics of Coronavirus Disease 2019 in China How macroscopic laws describe complex dynamics: asymptomatic population and CoviD-19 spreading Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of incubation period distribution of COVID-19 using disease onset forward time: a novel cross-sectional and forward follow-up study Coronavirus Travel Restrictions Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms A contribution to the mathematical theory of epidemics Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China Table 1. Expectation and 95th percentile in different