key: cord-0713544-gxk5t1yp authors: Jiang, Ally Bi-zhu; Lieu, Richard; Quenby, Siobhan title: Significantly longer Covid-19 incubation times for the elderly, from a case study of 136 patients throughout China date: 2020-04-18 journal: nan DOI: 10.1101/2020.04.14.20065896 sha: 3dac6b5cc9ca97a44c1553cd96d6ba69593296eb doc_id: 713544 cord_uid: gxk5t1yp Objective: To infer Covid19 incubation time distribution from a large sample. Method: Based on individual case data published online by 21 cities of China, we investigated a total of 136 COVID19 patients who traveled to Hubei from 21 cities of China between January 5 and January 31, 2020, remained there for 48 hours or less, and returned to these cities with onset of symptoms between January 10 and February 6, 2020. Among these patients, 110 were found to be aged 15 to 64, 22 aged 65 to 86, and 4 aged under 15. Findings: The differential incubation time histogram of the two age groups 15 to 64 and 65 to 86 are adequately fitted by the log normal model. For the 15 to 64 age group, the median incubation time of 7.00+1.10 - 0.9 days (uncertainties are 95 % CL) is broadly consistent with previous literature. For the 65-86 age group, the median is 10.9+2.7 - 2.0 days is statistically significantly longer. Moreover, for this group, the 95 % confidence contour indicates the data cannot constrain the upper bound of the log normal parameters μ, σ by failing to close there; this is because the sample has a maximum incubation time of 17 days, beyond which we ran out of data even though the histogram has not yet peaked. Thus there is the potential of a much longer incubation time for the 65 to 86 age group than 10 to 14 days. Only a much larger sample can settle this. The incubation time of Covid-19 [1] and the closely related question of asymptomatic case numbers are two topics of major interest and concern. On the former, the research results presented here for the main age group of 15-64 broadly corroborates previous studies [2, 3, 4, 5, 6] , but for the elderly group of Hubei: patient case number, age, sex, first and last day in Hubei, and first day with symptoms. In the current investigation we included only those COVID-19 patients who stayed in Hubei for at most two calendar days. The day of exposure was taken as the first day to Hubei if the patient stayed in Hubei for one calendar day; or 20 as the middle of the first and second day in Hubei if the patient stayed for two calendar days. By excluding COVID-19 patients who stayed in Hubei for more than two days, one can better define the the day of exposure. The incubation period for each COVID-19 patient is inferred as the number of days between exposure and symptom onset. 25 As will be shown below, the distribution of incubation times may adequately be fitted with a log normal distribution for the two age groups mentioned above, 2 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint suggesting that the incubation time τ (in days) is a multiplicative variate. Specific to the problem of Covid-19, it seems reasonable to envisage an inverse 30 proportionality relationship between the virus growth rate 1+r and τ , and the sample average growth rate is where N is the sample size (or number of measurements of the incubation time or, equivalently, both ln(1 + r) and ln τ are additive variates. Moreover, if 35 ln(1 + r j ) is normally distributed because 1 + r j itself is a geometrically many times averaged growth rate of the virus inside the human body (i.e. the Central Limit Theorem may apply to ln(1 + r j )), the distribution of τ j would then be a log normal 1 of (arithmetic) mean µ and standard deviation σ. Thus the expected number of cases within some incubation time interval k, or incubation 40 time τ k = kτ 0 , is with the coefficient 1/τ k originating from the relationship between logarithmic and linear intervals, viz. d ln τ = dτ /τ . The applicability of the log normal model to the Covid-19 incubation times distribution compels one to calculate the mean incubation time as the geometric 45 mean (2) at least as an alternative, as we shall do in the following section. 1 Under this assumption one can also derive (3) without enlisting the Central Limit Theorem, by consideration the viral interaction with the human body as a thermodynamic process with a fixed mean and variance, i.e. one which maximizes the Entropy − j p j ln p j subject to the constraints p j = 1, j p j ln τ j = µ, and j p j (ln τ j ) 2 = σ 2 + µ 2 . 3 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . The usual way of fitting a multi-parameter model to the data is by minimizing the χ 2 statistic w.r.t. the model parameters α, β, γ, · · ·, where n k ≡ n k (α, β, γ, · · ·) is the num-50 ber of cases for incubation time interval k as expected by the model,n k is the observed number of cases, and σ 2 k = n k is the expected model variance assuming Poisson fluctuation in the case counts. In the case of the currently scrutinized log normal model distribution consisting of three parameters α = µ, β = σ, and γ = n 0 (where µ, σ, and n 0 are as in (3)), 95 % confidence intervals δµ, δσ, and 55 δn 0 for the best-fit parameters may be inferred from the ∆χ 2 criterion discussed below. For a log normal quantile q = q(m) = e µ+mσ , ∆χ 2 also yields δq. Yet an alternative method is to take advantage of the independence of µ and σ in the model by writing It turns out, however, that the fitting technique of (4) leads to large uncertainties 60 in some parameters of interest, including those computed within the dynamic range of incubation times as set by the data. This is especially the case for the second of the two age groups, consisting of age 65-86 years patients, where the incubation time span of the model is significantly wider than the data. Below we present a slight variation of the method in (4) which avoids the problem. To facilitate introducing the modified model fitting algorithm we first remind the reader of the standard maximum likelihood method 2 , which relies upon the n k 1 limit for all time intervals k, the limit where a Poisson distribution of counts tend to a Gaussian with equality between variance and mean. Thus the 2 The maximum likelihood method only works when there is independence of measurement intervals. Thus it is incorrect to apply the model to fitting the data of a cumulative (or integral) distribution, in which the counts of previous intervals affect the later ones. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint likelihood of a match between the number of counts n k predicted by the model 70 and observational datan k for all incubation time intervals k is given by the conditional probability and one's task is to maximize P (data|model) w.r.t. the model parameters α, β, γ, · · ·, where m is the total number of time intervals spanned by the data. If the number of counts per bin does not satisfy n k 1, however, the distribution 75 of counts will not be normal, i.e. it will not be genuinely Poisson, in which case the expression on the right side of (6) is not strictly valid. In this limit, which does apply to the second of our two age groups (viz. 65 -86 of age), one should use likelihood ratios [7] instead of (6), but because the difference between the two output parameter values are small compared to their uncertainties there is 80 no real advantage in deviating from (6); we simply feel that for completeness sake this subtle point should be mentioned. The maximization of (6) w.r.t. model parameters is obviously the same as minimizing χ 2 as given in (4). Explicitly, if one writes, in the context of the log normal model (3), the procedure would be equivalent to solving Thus there are 3 equations in 3 unknowns, and the minimization process is fully deterministic. If, as mentioned above, the total incubation time mτ 0 spanned by the entire database is not long enough to clinch the full extent of the log normal distri-90 bution, one will have to constrain the fitting procedure to ensure that the area under the log normal is exactly equal to the total number of cases N over the time mτ 0 . The specific question one seeks to answer here is: given there are 5 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint N cases to be randomly distributed into m time intervals in accordance with a prescribed set of average proportions {n k }, k = 1, 2, · · · , m satisfying n 1 + n 2 + · · · + n m = N, and n k = n k (n 0 , µ, σ), how would one tune n 0 , µ, and σ to maximize the likelihood of the hypothetical distribution agreeing with the data, when Poisson counting uncertainties in the latter are taken into account? Note that the model for n k does not have to cut off at k = m, i.e. (9) is merely there to enforce the equality between expected and actual total number of cases within the full 100 range of incubation times available to the study. In this way, one is obliged to respect only those 'in range' parameter values ensuing from the best fit model. Thus one would now extremize the statistic where the vanishing of the last partial derivative enforces the area (λ is a La-105 grange multiplier). Once again there are 4 equations solving for 4 unknowns, and the number of free parameters is reduced from the previously 3 to currently 2 (note however that the area constraint is not as simple as fixing n 0 ; note also that the degrees of freedom of the whole problem is increased from the m − 3 to m − 2). Turning to the confidence interval for an interested parameter η(n k ) = η(µ, σ, n 0 ), such as the expected (arithmetic) mean incubation time within the observation interval mτ 0 , one could re-extremize F subject to yet another additional constraint which ensures η equals some fixed value η 0 by invoking one more Lagrange multiplier 115 6 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . ν to form the G statistic, and requiring The resulting increase in χ 2 w.r.t. (11), ∆χ 2 , is also χ 2 distributed with one degree of freedom, because the extra constraint enforced by ν has likewise increased the degrees of freedom by one 3 . Thus, to obtain the 95 % confidence 120 interval in η, one needs to find the value of η 0 such that (14) leads to a χ 2 increase of ∆χ 2 = 3.8. This procedure applies if η is a mean, variance, or quantile, or any other attribute of the distribution. By adopting the aforementioned procedure, we obtained the best-fit parameters as shown in Table 1 , the goodness of fit in Figures 1 and 2 , cumulative 125 frequencies in Figures 3 and 4 , and 95 % confidence contours in Figure 5 , for the two age groups 15-64 and > 65. The two age groups being analyzed are clearly distinct samples. For the 15 -64 age group, the median incubation time of 7.00 +1.10 −0.90 days (uncertainties 130 are 95 % CL, see Table 1 ) is broadly consistent with previous measurements [2, 3, 4, 5, 6] . For the 65-90 age group, the median is 10.9 +2.7 −2.0 days is statistically significantly longer. The other equally importantly point is, as revealed by the open confidence contour in Figure 5b, Table 1 : Parameters of the best log normal fit to the two age groups. µ and σ are as defined in (3); while the other parameters are calculated by applying the best-fit model to the incubation time range of 17 days, which is the full range spanned by the data (true for both age groups). The information on each parameter comprises an expectation value sandwiched between the lower and upper uncertainty limit, both of which are 95 % confidence (note that for the 65+ age group their upper µ and σ uncertainties are not constrained by the data, because the data have not revealed the other side of the peak of the differential probability distribution, see beyond which one ran out of data even though the differential case histogram has not yet peaked ( Figure 3 ). This indicates the potential of much longer author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint probability of null hypothesis rejection). Note however that for this age group the data have not revealed the other side of the peak of the differential probability distribution. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20065896 doi: medRxiv preprint Global research on Coronavirus diseases Clinical features of patients infected with 2019 novel coronavirus in Wuhan Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wu-han Incubation period and other epidemiological characteristics of 2019 novel coronavirus in-fections with right truncation: a statistical analysis of publicly avail-able case data Early transmission dynamics in Wuhan, China, of 160 novel coronavirus-infected pneumonia The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Parameter Estimation in astronomy through application of the likelihood ratio