key: cord-0844061-0o834sx9 authors: Ejima, Keisuke; Kim, Kwang Su; Ludema, Christina; Bento, Ana I.; Iwanami, Shoya; Fujita, Yasuhisa; Ohashi, Hirofumi; Koizumi, Yoshiki; Watashi, Koichi; Aihara, Kazuyuki; Nishiura, Hiroshi; Iwami, Shingo title: Estimation of the incubation period of COVID-19 using viral load data date: 2021-03-15 journal: Epidemics DOI: 10.1016/j.epidem.2021.100454 sha: 0c52cec8f2a2604e3cdffcc90ea66289b6ef5955 doc_id: 844061 cord_uid: 0o834sx9 The incubation period, or the time from infection to symptom onset, of COVID-19 has usually been estimated by using data collected through interviews with cases and their contacts. However, this estimation is influenced by uncertainty in the cases’ recall of exposure time. We propose a novel method that uses viral load data collected over time since hospitalization, hindcasting the timing of infection with a mathematical model for viral dynamics. As an example, we used reported data on viral load for 30 hospitalized patients from multiple countries (Singapore, China, Germany, and Korea) and estimated the incubation period. The median, 2.5, and 97.5 percentiles of the incubation period were 5.85 days (95% CI: 5.05, 6.77), 2.65 days (2.04, 3.41), and 12.99 days (9.98, 16.79), respectively, which are comparable to the values estimated in previous studies. Using viral load to estimate the incubation period might be a useful approach, especially when it is impractical to directly observe the infection event. The current COVID-19 outbreak is characterized by a longer incubation period (i.e., time from infection to symptom onset) than that of influenza and other acute respiratory viruses. This longer incubation period means that many of the strategies for disease control that rely on symptom-based surveillance (e.g., community fever monitoring or home observation of travelers for symptoms) will not effectively control the outbreak. For example, the wide geographic spread of SARS-CoV-2 could have been driven by this long incubation period, allowing cases to pass through border control measures such as temperature screening (1) . Estimating the incubation period is challenging, because we rarely directly observe the time of infection or the time of symptom onset (examples to the contrary in HIV infection show the intense followup needed to observe these events (2, 3) ). The first study estimating the incubation period of SARS-CoV-2 was that of Li et al (4) , who fit a log-normal model to a subset of cases for whom detailed information about exposure to another case was available. However, even with meticulous contact tracing, directly observing infector-infectee pairs is a timeconsuming process, especially when the incubation period is lengthy. Measuring the incubation period through contact tracing is more difficult if the infector-infectee pair had a lot of contact with each other, leading to a number of suspected individuals needing to be interviewed. Indeed, Bi et al demonstrated large uncertainty (the interval of exposure was more than 10 days for about 25% of the cases) concerning the timing of infection for COVID-19 in China (4) . Although a majority of these studies (4-6) used a statistical modeling technique that accounts for uncertainty in both the reports of exposure time and the time of symptom onset (7), they had to inherently use a heuristic weight function for the censored information. Here we propose another approach to estimating the incubation period, in which we use longitudina l data on viral load and hindcast the point of initial infection. Viral load data were collected at the early stage of the epidemic for clinical purposes (e.g., understanding the etiology and the pathophysiology of COVID- 19) and to ensure patients were no longer shedding virus (or more precisely, viral fragments) before hospital discharge. The data were analyzed using a mathematical model describing viral dynamics, which typically draws a bell-shaped curve (i.e., viral load first increases exponentially until the peak, where it starts to decline). Although the data are available only after the onset of symptoms, the timing of infection can be estimated by hindcasting the model for each case. J o u r n a l P r e -p r o o f We extracted viral load data for 30 hospitalized patients as reported in four papers (Table S1 ) and quantified the dynamics of SARS-CoV-2 infection with a mathematical model previously proposed (8) (9) (10) : where ( ) and ( ) are the relative fraction of uninfected target cells at time t to those at time 0 and the amount of virus at time t, respectively. The parameters , , and are the rate constant for virus infection, the maximum rate constant for viral replication, and the death rate of infected cells, respectively. The viral load data from the four different papers were fitted to the model with mixed effects, which assumed that the parameters for each individual follow normal distributions with the same population mean. Several different models other than the model described above are available to explain the viral load trajectory of acute infection. However, we chose this model because it better explains the data. As an example, the model considering eclipse phase has been proposed for acute infection and has been applied to SARS-CoV-2 (11, 12) . We fitted the models with and without eclipse phase and compared the goodness-of-fit (i.e., BIC). Although the BICs were comparable, we needed to fix the parameter value that determines the length of the eclipse phase. Thus, we decided to use the current model without eclipse phase. The viral load dynamics for each case in Asia and Europe is shown in Figures 1 and 2 , respectively, and the estimated values of the parameters for each case are summarized in Table S2 . The peak of viral load appeared 2 to 3 days after symptom onset. Note that in the data, there were no cases in which viral load was measured before symptom onset. Among the total 30 cases, viral load was the highest in the first measurement in 14 cases. To assess the day on which SARS-CoV-2 infection was established, in other words, the start of the exponential growth phase of the viral load (9), we needed to set the viral load threshold for this timing. The time of the infection event, inf , was identified by means of back-calculation by using the dataset when the viral load reaches the threshold. We used the three cases reported from China (Patients D, H, and L) with J o u r n a l P r e -p r o o f known primary cases to determine the viral load threshold to establish infection (13) . For these three cases, the day of exposure was assumed to be equal to the day of the infection event, as follows. 25 2 )) as the threshold for further analyses. Because the viral load thresholds estimated for the three patients differed substantially, we performed the same analysis for each patient and used the thresholds to estimate the distribution of incubation periods as sensitivity analyses. With the viral load threshold, we computed the incubation period, inf , for all patients by hindcasting the mathematical model after fitting the model to the data. To address the uncertainty of the estimation, we resampled 100 parameter sets for each individual including the viral load threshold and obtained the corresponding 100 inf for each individual (i.e., 100×30 inf in total) (see "The nonlinear mixed effect model" for the details of computation). Then, the three parametric distributions were fitted to 100×30 inf : Weibull, gamma, and log-normal distributions. Comparing the Akaike Information Criteria (AIC) for those three distributions, the best model (i.e., that with the lowest AIC) was used for further analyses. The parametric bootstrap method was used to assess parameter uncertainty. Specifically, the bootstrap sample was composed of 30 inf : a single inf was resampled from the 100 inf of each individual. The best parametric model (i.e., J o u r n a l P r e -p r o o f Weibull, gamma, or log-normal distribution) was fitted to the bootstrapped data for parameter inference. We repeated this process 1000 times and obtained 1000 parameter sets, and the median, 2.5, and 97.5 percentiles of the distribution were computed. As a sensitivity analysis, the above process was repeated with the data from Europe (Germany) and Asia (China, Singapore, and Korea) separately. Inferring the timing of infection is challenging in general. Given asymptomatic and presymptomatic transmission and the relatively long incubation period of SARS-CoV-2, not all patients are aware of how they were exposed or the specific time of exposure. The median incubation period of SARS-CoV-2 is estimated to be 5 to 6 days (4-6, 14), whereas that for other acute respiratory viral infections, such as SARS-CoV-1, non-SARS human coronaviruses, influenza A virus, and influenza B virus, are estimated to be 4.0, 3.2, 1.4, and 0.6 days, respectively (15) . The proportion of SAS-CoV-2 infection that is asymptomatic ranges from 40% to 45%, which is close to that for influenza (50%) (16) . By contrast, asymptomatic cases are rarely observed for SARS-CoV-1 (17) . Thus, we proposed using viral load data, which are externally measured and are independent of recall. The median of the estimated incubation period was about 6 days, and 97.5% of cases developed symptoms in about 13 days. These estimations are consistent with previously published estimates (4) (5) (6) 14) . however, the risk of resurgence will not be negligible (influenza outbreaks happen even though effective vaccines are available). In addition to the vaccine, contact tracing is important to reduce the risk for resurgence, and being able to make valid estimates of the incubation period helps to reduce the burden in the contact tracing process. Indeed, contact tracing helped to further identify and treat cases earlier than a symptom-based approach (4). Furthermore, when we know the incubation period distribution, we can better assess the role of presymptomatic infection in the outbreak. Combined with the serial-interval distribution, the incubationperiod distribution has been used to quantify the magnitude of presymptomatic infection (18) . The strength of this approach is that it can complement the limitations of the classic interview-based approach regarding ascertaining the exposure event. Our proposed approach may be applicable not only to human infectious disease and zoonoses such as influenza and COVID-19, but also to animal/livestock infectious diseases such as foot-and-mouth disease when contact recall is not possible. Furthermore, replicating viral load from infection to recovery is helpful not only for estimating the incubation period but also for clinical and epidemiologic understanding of the disease. For example, we observed that the viral load J o u r n a l P r e -p r o o f of SARS-CoV-2 peaked 2 to 3 days after the onset of symptoms, which is consistent with the finding that the viral load in throat swabs was on the decline when first measured (2-4 days since symptom onset) (18, 19) . There are several limitations to be noted in this study. One is related to the modelling approach. Our approach did not account for any uncertainty in reported day of symptom onset because the data did not include the range of exposed days. The approach accounting for uncertainty was previously proposed by Reich et al (7) . Combining our approach with that of Reich et al might reduce uncertainty surrounding the precise reporting of exposure and illness onset once such data are available, which is doable because estimation of the timing of infection and estimation of the incubation-period distribution are independent. The model we used in this study did not include detailed immune response or antiviral effects given limited information. We can update the model once relevant data are available. Another limitation is relevant to the data we used. The proposed approach requires collection of viral loads over time since symptom onset, which might not be feasible for all patients or in resource-limited contexts. A few studies have investigated change in viral load over time (20, 21) . However, those studies included viral load data with observation at a single time point because they did not consider individua l variability. Further, one paper assumed that the incubation period was 5 days when symptom onset information was not available (20) . We believe such an approach is unreasonable because 1) the day of exposure is extremely hard to observe (and therefore we are proposing to use longitudinal viral load data), and 2) the incubation period varies between patients. We admit the inclusion criterion for data in our study (more than three data points from each patient) is a limitation of our study; however, we do not think that adding nonlongitudinal data or data without information on symptom onset would be an option. We used data from hospitalized and symptomatic patients. If viral dynamics and the incubation period differ in unhospitalized patients, the estimated incubation-period distribution should represent that for hospitalized patients only. Indeed, we are planning to collect saliva samples from mildly symptomatic to asymptomatic patients (https://rctportal.niph.go.jp/en/detail?trial_id=jRCT2071200023). We used the viral load data collected from upper respiratory specimens (i.e., nasopharyngeal, oropharyngeal, nasal swabs), because viral dynamics differs between organs, as evidenced in multiple studies (e.g., rectal swab vs. nasal swab) (22, 23) . However, viral dynamics might also differ between nasal, nasopharyngeal, and oropharyngeal swabs, even though they J o u r n a l P r e -p r o o f are close. Similarly, sex, age, and other factors might influence viral dynamics; however, such information was not consistently available from all patients. We treated the different types of swabs as a covariate in the model, but the computation did not converge because of the small sample sizes. However, because we used a mixed-effect model, the random effect in every parameter (on each patient) should have considered the difference in viral dynamics due to the sample type and demographic differences to some extent. Being able to make valid estimates of the incubation period distribution is essential for mitigating risk. Knowing the estimated incubation period distribution simplifies the process of contact tracing and improves our understanding of the role of presymptomatic infection. By unifying the proposed approach with existing epidemiologic methods, we can achieve precise estimation of the incubation-period distribution. J o u r n a l P r e -p r o o f The viral load data were from 30 hospitalized patients presented in four previously published studies of hospitalized COVID-19 patients (13, 19, 23, 24) . All cases used in our analysis presented with symptoms before or after hospitalization. For consistency, the viral load data from upper respiratory specimens were used in the analysis. Patients treated with antivirals or with less than two data points were excluded. For all the studies from which we extracted data, ethics approval was obtained from the ethics committee at each institute. Written informed consent was obtained from the patients or their next of kin in the original studies. We summarized the data in Table S1 and Impact of international travel and border control measures on the global spread of the novel 2019 coronavirus outbreak Prospective Study of Acute HIV-1 Infection in Adults in East Africa and Thailand Molecular dating and viral load growth rates suggested that the eclipse phase lasted about a week in HIV-1 infected adults in East Africa and Thailand Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Incubation Period and Other Epidemiological Characteristics of Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data Estimating incubation period distributions with coarse data Quantifying the effect of Vpu on the promotion of HIV-1 replication in the humanized mouse model Modelling viral and immune system dynamics Modelling SARS-CoV-2 Dynamics: Implications for Therapy Kinetics of Influenza A Virus Infection in Humans Timing of Antiviral Treatment Initiation is Critical to Reduce SARS-CoV-2 Viral Load SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China Incubation periods of acute respiratory viral infections: a systematic review Transmission of Influenza: Implications for Control in Health Care Settings Asymptomatic severe acute respiratory syndrome-associated coronavirus infection Temporal dynamics in viral shedding and transmissibility of COVID-19 Virological assessment of hospitalized patients with COVID-2019 SARS-CoV-2 viral load peaks prior to symaptom onset: a systematic review and individual-pooled analysis of coronavirus viral load from 66 studies Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral shedding Epidemiologic Features and Clinical Course of Patients Infected With SARS-CoV-2 in Singapore Clinical Course and Outcomes of Patients with Severe Acute Respiratory Syndrome Coronavirus 2 Infection: a Preliminary Report of the First 28 Patients from the Korean Cohort Study on COVID-19 The cumulative distribution function for total, Asian, and European cases, respectively. We used the log-normal distribution for fitting. The grey lines were drawn based on the 1000 different bootstrap samples. The horizontal bars are 95% CIs at 2.5%, 50%, and 97.5% of the distribution. The solid red curve corresponds to the median of the estimated distribution. (D, E, F) The probability density function for total The authors declare that they have no competing interests.J o u r n a l P r e -p r o o f