key: cord-0979729-somid5jt authors: Yin, Ming-Ze; Zhu, Qing-Wen; Lü, Xing title: Parameter estimation of the incubation period of COVID-19 based on the doubly interval-censored data model date: 2021-06-18 journal: Nonlinear Dyn DOI: 10.1007/s11071-021-06587-w sha: 0c04c46221fa42b0c4bfd72f477c3805bd0b0a6e doc_id: 979729 cord_uid: somid5jt With the spread of the novel coronavirus disease 2019 (COVID-19) around the world, the estimation of the incubation period of COVID-19 has become a hot issue. Based on the doubly interval-censored data model, we assume that the incubation period follows lognormal and Gamma distribution, and estimate the parameters of the incubation period of COVID-19 by adopting the maximum likelihood estimation, expectation maximization algorithm and a newly proposed algorithm (expectation mostly conditional maximization algorithm, referred as ECIMM). The main innovation of this paper lies in two aspects: Firstly, we regard the sample data of the incubation period as the doubly interval-censored data without unnecessary data simplification to improve the accuracy and credibility of the results; secondly, our new ECIMM algorithm enjoys better convergence and universality compared with others. With the framework of this paper, we conclude that 14-day quarantine period can largely interrupt the transmission of COVID-19, however, people who need specially monitoring should be isolated for about 20 days for the sake of safety. The results provide some suggestions for the prevention and control of COVID-19. The newly proposed ECIMM algorithm can also be used to deal with the doubly interval-censored data model appearing in various fields. In the late 2019 and early 2020, a number of patients infected with the novel coronavirus disease 2019 (COVID-19) have been successively found in Wuhan, Hubei province, China [1, 2] . This newly discovered virus causes severe acute respiratory disease. On March 11, 2020, the World Health Organization (WHO) announced the novel COVID-19 a pandemic [3] . As time goes by, COVID-19 epidemic has spread very rapidly all over the world. The distribution of COVID-19 cases by country in the world is shown in Fig. 1 . The epidemic prevention work is gradually taken seriously by more and more countries [2] . In many countries, the drastic restrictive measures have not prevented the outbreak of new pandemic's waves [3, 4] . Despite the worldwide manages to control the growth of COVID-19, the number of COVID-19 incidences is still rising at a reproduction rate of 3.77 [6, 7] . The outbreak evolution for the current most affected countries is shown in Fig. 2 . The mortality rate in cases infected with COVID-19 is 5.25% worldwide. This mortality rate is 7.60% in the European region, 2.24% in the Eastern Mediterranean region, 2.22% in the African region, 2.95% in the South-East Asia region, 5.07% in the region of Americas, and 3.55% in the region of Western Pacific [6, 8] . The novel COVID- 19 has become a worldwide pandemic affecting 219 countries with an estimate of more than 159 million infected cases and over 3.3 million deaths (WHO Coronavirus Disease Dashboard, May 12, 2021) [11, 12] . To protect against COVID-19 epidemic, we must have a deep understanding of the basic characteristics of COVID-19, among which one of the most important features is the incubation period of the virus [2, 14] . The incubation period of COVID-19 is the period from infection to the earliest appearance of clinical symptoms of COVID-19 patients [15, 16] . Estimation of virus incubation period is of great significance for the epidemiological investigation and the development of epidemic prevention and controlling measures [17] . The incubation period is an aid for defining the time period for which contact tracing is to be done [18] . It helps in active monitoring of people having higher exposure and also in determining the length of active monitoring so as to save resources [19] . Knowledge of the incubation period distribution is also necessary for estimating the size and transmission potential of COVID-19 outbreaks [17] . When fitting the parameters of the virus incubation period distribution, researchers usually simplify the data structure by treating data as the singly intervalcensored data or exact data in order to reduce the difficulty of data analysis [17] . In Refs. [20, 21] , the sample data of the incubation period of the virus are treated as the doubly interval-censored data without unnecessary data structure simplification to improve the accuracy of the researches. In Refs. [17, 22] , maximum likelihood estimation (MLE) and Bayesian estimation are widely applied in the field of the doubly interval-censored data model. It has been shown that the doubly intervalcensored data model makes the research results more reliable. Fast and effective algorithm makes great contribution to the data analysis and processing [9, 10] . Expectation maximization (EM) algorithm optimizes the process of maximizing the likelihood function to get the parameter estimates through the iterative procedure [23, 24] . Some extensions on the EM algorithm have been proposed and widely applied as supplement to the EM algorithm theory [25] . Expectation conditional maximization (ECM) algorithm solves the problem of multi-parameter estimation by approaching the optimal estimate values step by step [26] , and expectation mostly maximization (EMM) algorithm accelerates the convergence speed of the iterative algorithm by improving the expectation function in the doubly interval-censored data model. In order to make the EM algorithm more suitable for processing the dou-bly interval-censored data, we propose a new algorithm named as expectation mostly conditional maximization (ECIMM) algorithm to estimate the parameters of the COVID-19 incubation period. In this paper, the sample data of the incubation period of COVID-19 will be regarded as the doubly interval-censored data [20, 21] . The incubation period of COVID-19 will be fitted by lognormal distribution and Gamma distribution based on the open data of COVID-19 incubation period collected so far [27] . In the field of statistical research, maximum likelihood estimation and the EM algorithm are mature parameter estimation methods [25, 26, 28] . The newly proposed ECIMM algorithm enjoys better universality and convergence compared with the related basic algorithm. We will use the maximum likelihood estimation, the EM algorithm and the ECIMM algorithm to estimate the parameters of the incubation period of COVID-19 [17, 26] , and propose some suggestions for the prevention and control of COVID-19 epidemic. The rest of this paper is organized as follows. We will introduce the data background in Sect. 2, and the maximum likelihood estimation and the EM algorithm in Sect. 3. The ECIMM algorithm will be proposed in Sect. 4. In Sect. 5, we will use three methods to estimate the parameters and propose the suggestions for epidemic prevention. In Sect. 6, we will discuss our parameter estimation results with others. Finally, the conclusion and future work will be emphasized in Sect. 7. Current common estimations of the incubation period of COVID-19 are mostly based on studies of accurate case data or simplified case data [17] . However, in the actual data acquisition, accurate data can not be obtained easily, and only the approximate intervals of the infection and onset time of patients can be investigated [17] . In order to obtain accurate estimation of COVID-19 incubation period, we will estimate parameters on the basis of the doubly interval-censored data model [17, 20, 21] . According to the doubly interval-censored data model [17, 20, 21] , E and S represent the time when patients infected with COVID-19 are exposed to the novel coronavirus and the time when symptoms occur, respectively. The time of the incuba-tion period is T = S − E. A typical observed value consists of four time points, namely X = (E L , E R , S L , S R ), where the subscripts L and R correspondingly represent the left and right endpoints of the interval of E and S, as shown in Fig. 3 . When E and S are both intervals, the observed data is called the doubly interval-censored data. Accordingly, when one of E and S is an exact value and the other is an interval, the observed data is called the singly interval-censored data. When E and S are both exact values, the observed data is the exact data. There are several distributions which are suitable for simulating the incubation period, e.g., the lognormal distribution and Gamma distribution [29] [30] [31] . In this paper, we will correspondingly assume that the incubation period follows the lognormal and Gamma distribution, and estimate the main parameters. As mean and quantiles play an important role in the research of the incubation period of COVID-19, we will pay attention to mean and quantiles of the novel coronavirus incubation period [32] . The data of this paper are from the online repository [27] . The repository consists of the information of 3397 patients infected with COVID-19, such as the location, the country, the gender and the age of each patient. The repository also includes the time when patients are exposed to the novel coronavirus and the time when symptoms occur, which are used to estimate the incubation period of the novel coronavirus in this paper. The density of T and E are recorded as f θ (t) and h λ (v). In general, we can suppose that E and T are mutually independent and E follows uniform distribution. The Fig. 3 The doubly interval-censored data likelihood functions of the doubly interval-censored data, the singly interval-censored data and the exact data are as follows [33] : An observed data of the sample may be the doubly interval-censored data, and can also be the singly interval-censored data or the exact data [17] . We introduce two indicative variables, called σ i and ω i . When σ i =1, the observed data is the doubly interval-censored data. When ω i =1, the observed data is the singly interval-censored data. And σ i =ω i =0 indicates that the observed data is the exact data. The likelihood function of this observed data is as follows [33] : The likelihood function of the whole sample can be easily obtained by multiplying likelihood functions of all observed data: According to the idea of maximum likelihood, the best estimates of the parameters are the values which maximize the likelihood function: The likelihood function of the whole sample is shown as Eq. 5. In the rest of this paper, we transform L(θ, λ) to p(x|θ, λ) in order to express the algorithm more clearly and concisely. EM algorithm is an iteration algorithm [23, 24] . According to the theory of survival analysis, we introduce a latent variable z to advance the parameter estimation process. log p(x, z|θ, λ) is the log-likelihood function of complete data. We calculate the mathematical expectation of the log-likelihood function, named as ELBO function. According to the idea of maximum likelihood, we get the best estimates of the parameters by finding the values which maximize the ELBO function: E step: When meets the allowable error range for the specific problems, the iterative process should be stopped. The likelihood function of the whole sample is shown as Eq. 5. According to the theory of survival analysis, we introduce a latent variable z with the density function q(z) to transform the log-likelihood function: We take the density function of the latent variable z as the undetermined function q(z) while the EM algorithm treats the density function of the latent variable z as the posterior density function p(z|x, θ (i) , λ (i) ). We get the ELBO function of the ECIMM algorithm by calculating the mathematical expectation of the loglikelihood function: The following algorithm steps are based on the optimized ELBO function. In step M M 1 , we fix parameter values to obtain the optimal density estimateq(z) by maximizing ELBO function. In step M M 2 , We fix the density function of z asq(z) to estimate the parameters by maximizing ELBO function. M M 1 step: Fix θ, λ, and solvê M M 2 step: Fix q =q, and accomplish the parameter estimation by the following steps: Theoretically, we can get the parameter estimates by computing the partial derivatives of the ELBO function and equating them to be zero. However, it is difficult to obtain those estimates when the ELBO function is multivariate in the doubly interval-censored data model. The ECIMM algorithm obtain parameter estimates by approaching the optimal values step by step. C M 1 step: Fix θ = θ (i) , and C M 2 step: Fix λ = λ (i+1) , and When meets the allowable error range for the specific problems, the iterative process should be stopped. Compared with the EM algorithm, which directly treats the density function of the latent variable z as the posterior conditional density, the ECIMM algorithm treats it as the undetermined variable q(z) for research and analysis. Since the density function q(z) ≤ 1 and the integral value log q(z)dz ≤ 0, the ELBO function of the ECIMM algorithm is more suitable for parameter estimation based on the doubly interval-censored data model: Because the ECIMM algorithm optimizes the ELBO function, the iterative value can be closer to the true value of the parameters. According to maximum likelihood theory, the ECIMM algorithm reduces the number of iteration steps and accelerates the convergence speed of the algorithm: At the same time, in the steps of maximizing the ELBO function to get the parameter estimates, the basic idea is to set the partial derivatives to be zero and solve the equations However, since the ELBO function in the doubly interval-censored data model is always a multivariate function, it is hard to realize the algorithm due to the complexity of the calculation. ECIMM algorithm optimizes the algorithm by approaching the optimal parameter estimates step by step. It makes the algorithm much easier to implement so that it can be widely used to solve various problems: arg max Overall by improving the ELBO function in the doubly interval-censored data model and approaching the optimal estimate value step by step, we accelerate the convergence speed of the algorithm and improve the universality of the algorithm. In the algorithm description process of fitting specific distribution, we record the ELBO function as Q function in order to intuitively reflect the values of parameters in the algorithm iteration. Assuming that E follows uniform distribution on (a, b) and T follows lognormal distribution, we apply the ECIMM algorithm to estimate the parameters of the novel coronavirus incubation period: M M 2 step: Fix q =q, and accomplish the parameter estimation by the following steps: It completes the process of When meets the allowable error range for the specific problems, the iterative process should be stopped. Assuming that E follows uniform distribution on (a, b) and T follows Gamma distribution, we apply the ECIMM algorithm to estimate the parameters of the novel coronavirus incubation period: M M 2 step: Fix q =q, and accomplish the parameter estimation by the following steps: (34) It completes the process of meets the allowable error range for the specific problems, the iterative process should be stopped. We estimate the incubation period of COVID-19 using the doubly interval-censored data model with the sample size of 50, 200 and 500 [17, 20, 21] . We carry out the simulation through the maximum likelihood estimation method, the EM algorithm and the ECIMM algorithm based on the lognormal and Gamma distribution hypothesis, and the results are shown in Figs. 4, 5, 6, 7, 8, and 9. The parameter estimation results of the incubation period are obtained by using the above three parameter estimation methods based on the lognormal and Gamma distribution hypothesis, which are listed in Tables 1, 2 We can discover that in each figure, the simulation results obtained by three methods are close to each other, indicating that our new method is reasonable. As the sample size increases, the simulation results are more reliable. And we can find that the simulation results on the lognormal distribution assumption are different from the simulation results on the Gamma distribution assumption due to the different characteristics of two distributions. As mean and quantiles play an important role in the research of the incubation period of COVID-19, we estimate mean and quantiles of the novel coron-avirus incubation period. Tables 1, 2 and 3 list the estimation results of three estimation methods based on the assumption of lognormal distribution, with the sample size of 50,200 and 500, respectively. While Tables 4, 5 and 6 list the estimation results of three estimation methods based on the assumption of Gamma distribution, with the sample size of 50,200 and 500, respectively. We can find that the quantile values on the lognormal distribution assumption are larger than the quantile values on the Gamma distribution assumption, which is due to the different characteristics of two distributions. According to Table 3 , based on the lognormal distribution hypothesis, the average incubation period is about 6.8 days, and the probability of the incubation period not exceeding 15.31 days is 0.975. According to Table 6 , based on the hypothesis of Gamma distribution, the average incubation period is about 6.5 days, and the probability of the incubation period not exceeding 13.84 days is 0.975. The results of the research show that the 14-day quarantine period can largely interrupt the transmission of COVID-19, which fits within the range for the incubation period of 0 to 14 days assumed by the WHO, and is consistent with current medical control measures [1] . With the improvement of epidemic prevention and control, the situation has stabilized in some areas [1] . People from high-risk areas need to be specially monitored. According to Table 3 , the probability of the incubation period not exceeding 21.56 days is 0.995, based on the lognormal distribution hypothesis. According to Table 6 , the probability of the incubation period not exceeding 17.19 days is 0.995, based on the assumption of Gamma distribution. The results suggest that for the sake of safety, people who need specially monitoring should be isolated for about 20 days. The probability that a patient infected with COVID-19 does not show disease symptoms during 20-day quarantine period is nearly less than 0.5%. Therefore the new quarantine period can effectively block the spread of COVID-19. 1. We estimate the parameters of COVID-19 incubation period based on the doubly interval-censored data model, which makes the research results more reliable. 2. Our new ECIMM algorithm enjoys good convergence and universality. 3. Our results can be regarded as a valuable supplement of COVID-19 prevention. We compare our results with others, as listed in Table 7 . Backer uses Bayesian estimation method to estimate the main parameters with the sample size of 88 by assuming the incubation period follows lognormal, Gamma and Weibull distribution [19] . Qiu uses maximum likelihood estimation and Bayesian estimation method to estimate the main parameters with the sam-ple size of 543 by assuming the incubation period follows lognormal, Gamma and Weibull distribution [17] . They estimate the mean and 97.5% quantile, while we estimate the mean, 97.5% quantile and 99.5% quantile. They estimate the parameters by using maximum likelihood estimation and Bayesian estimation method, while we use maximum likelihood estimation method, the EM algorithm and the ECIMM algorithm to estimate the main parameters. Comparing the results, we find that the mean values are all between 6 and 7, in other words, they are consistent with each other. Based on the lognormal distribution assumption, the 97.5% quantile values obtained Comparing the results based on the Gamma distribution and lognormal distribution assumption, we find that the quantile values based on the lognormal distribution assumption are significantly greater than that based on the Gamma distribution assumption. It is due to the different characteristics of two distributions. Lognormal distribution has greater degree of dispersion than Gamma distribution. The quantile results based on the lognormal distribution assumption are more conservative. We compare the results obtained by three methods in our research, and find that the results are similar, which can indicate that our new method is reasonable. However, the ECIMM algorithm shows fast convergence speed in dealing with the doubly interval-censored data of COVID-19 incubation period, and it can be widely used to deal with the doubly interval-censored data in various fields. We hope that the research work of this paper can be regarded as the useful supplement to the related studies on the incubation period of COVID-19, and be helpful for the prevention and control of COVID-19. In this paper, we have estimated the parameters of COVID-19 incubation period based on the doubly interval-censored data model. Statistical inference analysis has been conducted on lognormal distribution and Gamma distribution. The maximum likelihood estimation method, the EM algorithm and the ECIMM algorithm have been used for parameter estimation. Each parameter estimation method of each distribution has been theoretically derived, which can be realized by mainstream computer programming software. We have obtained the estimates of mean, 97.5% quantile and 99.5% quantile, and suggested that 14-day quarantine period can largely interrupt the transmission of COVID-19, however, people who need specially monitoring should be isolated for about 20 days for the sake of safety. The research results can be regarded as a supplement of COVID-19 prevention. Instead of simplifying the data structure, we regard the sample data of the incubation period as the doubly interval-censored data, which makes the results more accurate and reliable. Furthermore, we propose a new algorithm called ECIMM algorithm which has good convergence and universality. The ECIMM algorithm shows fast convergence speed when dealing with the doubly interval-censored data of COVID-19 incubation period, and it can be widely used to deal with the doubly interval-censored data in various fields. In future studies, we will further extend the research results by estimating the parameters of the incubation period of COVID-19 based on other distribution assumptions. There are few researchers who have applied the ECIMM algorithm in current studies. We encourage further studies on its accuracy and convergence rate as the potential work. And we hope our work in this paper contributes to the prevention and control of COVID-19 and other epidemics. The impact of the coronavirus disease of 2019 on research A new SAIR model on complex networks for analysing the 2019 novel coronavirus (COVID-19) Quickest Detection and Forecast of Pandemic Outbreaks: Analysis of COVID-19 Waves Severe effects of the COVID-19 confinement on young children's sleep: A longitudinal study identifying risk and protective factors A new clustering method for the diagnosis of CoVID19 using medical images The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique Performance of metaheuristic optimization algorithms based on swarm intelligence in attitude and altitude control of unmanned aerial vehicle for path following Adaptive control of a 3D printer using whale optimization algorithm for bio-printing of artificial tissues and organs COVID19-inhibitory activity of withanolides involves targeting of the host cell surface receptor ACE2: insights from computational and biochemical assays Statistical Data Analysis on the Incubation and Suspected Period of COVID-19 Based on 2172 Confirmed Cases Outside Hubei Province Nilam: Nonlinear dynamics of a timedelayed epidemic model with two explicit aware classes, saturated incidences, and treatment Epidemic dynamics of influenzalike diseases spreading in complex networks Parametric Estimation for the Incubation Period Distribution of COVID-19 under Doubly Interval Censoring Incubation period for COVID-19: a systematic review and meta-analysis Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China Analysis of doubly-censored survival data, with application to AIDS Estimation of induction distributions with doubly censored data and application to AIDS A Bayesian analysis of multivariate doublyinterval-censored dental data The EM Algorithm and Extensions, 2nd edn Parameter Estimation of Exponential Distribution Under Fixed-time Interval Censoring Statistical analysis methods and applications of the interval-censored survival data Statistical Inference of Several Distributions under Interval Censored Data and Empirical Analysis of Financial Data Laboratory for the modeling of biological and socio-technical systems (MOBS) Stochastic EM algorithm for doubly interval-censored data Estimating the probability distribution of the incubation period for rabies using data from the 1948-1954 rabies epidemic in Tokyo Determination of the appropriate quarantine period following smallpox exposure: An objective approach using the incubation period distribution Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application Maximum likelihood for interval censored data: consistency and computation Data availability statement Our manuscript has no associated data. The authors declare that they have no conflict of interest.