key: cord-0540962-x0q9ekcz authors: Sanche, Steven; Lin, Yen Ting; Xu, Chonggang; Romero-Severson, Ethan; Hengartner, Nicolas W.; Ke, Ruian title: The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated date: 2020-02-09 journal: nan DOI: nan sha: 0e1451f1b7ecda5f1e941bf8a2ee4cdc0394c445 doc_id: 540962 cord_uid: x0q9ekcz The novel coronavirus (2019-nCoV) is a recently emerged human pathogen that has spread widely since January 2020. Initially, the basic reproductive number, R0, was estimated to be 2.2 to 2.7. Here we provide a new estimate of this quantity. We collected extensive individual case reports and estimated key epidemiology parameters, including the incubation period. Integrating these estimates and high-resolution real-time human travel and infection data with mathematical models, we estimated that the number of infected individuals during early epidemic double every 2.4 days, and the R0 value is likely to be between 4.7 and 6.6. We further show that quarantine and contact tracing of symptomatic individuals alone may not be effective and early, strong control measures are needed to stop transmission of the virus. real-time domestic travel data in China. Third, to address the issue of potential data collection and methodological bias or incomplete control of confounding variables, we implemented two distinct modeling approaches using different sets of data. These analyses produced estimates of the exponential growth rates that are consistent with one another and higher than previous estimates. A unique feature of our case report dataset (Table S1) is that it includes case reports of many of the first or the first few individuals who were confirmed with the virus infection in each province, where dates of departure from Wuhan were reported. All together, we collected 140 individual case reports (Table S1 ). These reports include demographic information including age, sex and location of hospitalization, as well as epidemiological information including potential time periods of infection, dates of symptom onset, hospitalization and case confirmation. Using this dataset, we estimated the basic parameter distributions of durations from initial exposure to symptom onset to hospitalization to discharge or death. Our estimate of the time from initial exposure to symptom onset is 4.2 days with a 95% confidence interval (CI for short below) between 3.5 and 5.1 days (Fig. 1C ). This estimated period is about 1 day shorter and has lower variance than a previous estimate (1) . The shorter time is likely caused by the expanded temporal range of our data that includes cases occurring after broad public awareness of the disease. Patients reported in the Li et al. study (1) are all from Wuhan and most had symptom onset before mid-January; in our dataset, many patients had symptom onset during or after mid-January and were reported in provinces other than Hubei province (where Wuhan is the capital). The time from symptom onset to hospitalization showed evidence of time dependence (Fig. 1D and S1). Before January 18, the time from symptom onset to hospitalization was 5.5 days (CI: 4.6 to 6.6 days); whereas after January 18, the duration shortened significantly to 1.5 days only (CI: 1.2 to 1.9 days) (p-value <0.001 by Mann-Whitney U test). The change in the distribution coincides with the period when infected cases were first confirmed in Thailand, news reports of potential human-to-human transmission and upgrading of emergency response level to Level 1 by China CDC. The emerging consensus about the risk of 2019-nCoV likely led to significant behavior change in symptomatic people seeking more timely medical care over this period. We also found that the time from initial hospital admittance to discharge is 11.5 days ( Fig. 1E ; CI: 8.0 to 17.3 days) and the time from initial hospital admittance to death is 11.2 days ( Fig. 1F ; CI: 8.7 to 14.9 days). Moving from empirical estimates of basic epidemiological parameters to an understanding of the actual epidemiology of 2019-nCoV requires model-based inference. We thus used mathematical models to integrate the empirical estimates with spatiotemporal domestic travel and infection data outside of Hubei province to infer the outbreak dynamics in Wuhan. Inference based on data outside of Hubei is more reliable because, as a result of the awareness of the risk of virus transmission, other provinces implemented intensive surveillance system to detect individuals with high temperatures and closely track travelers out of Wuhan using digital data to identify infected individuals (6) as the outbreak in Wuhan unfolded. We collected real-time travel data during the epidemic using the Baidu® Migration server ( Fig. 2A and Table S2 ). The server an online platform summarizing mobile phone travel data through Baidu® Huiyan [https://huiyan.baidu.com/]. Baidu® Huiyan is a widely used positioning system in China. It processes >120 billion positioning requests daily through GPS, WIFI and other means [https://huiyan.baidu.com/]. Therefore, the data represents a reliable, real-time and highresolution source of travel patterns in China. We extracted daily travel data from Wuhan to each of the provinces. We found that in general, between 40,000 to 140,000 people in Wuhan traveled to destinations outside of Hubei province daily before the lock-down of the city on January 23, with travel peaks on January 9, 21 and 22 (Fig. 2B) . Thus, it is likely that this massive flow of people from Hubei province during January facilitated the rapid dissemination of virus. We integrated the travel data into our inferential models using two approaches. The rationale of the first model, the 'first-arrival' approach, is that an increasing fraction of people infected in Wuhan increases the likelihood that one such case is exported to the other provinces. Hence, how soon new cases are observed in other provinces can inform disease progression in Wuhan (Fig. 2C ). This has similarities with earlier analyses to estimate the size of the 2019-nCoV outbreak in Wuhan based on international travel data (5, 7, 8) , though inference based infected cases outside of China may suffer large uncertainty due to the low volume of international travel. In our model, we assumed exponential growth for the infected population I* in Wuhan, * , where is the exponential growth rate and is the time of the exponential growth initiation, i.e. * 1. Note that is likely to be later than the date of the first infection event, because multiple infections may be needed before the onset of exponential growth (9). We used travel data to each of the provinces (Table S3 ) and the earliest times that an infected individual arrived at a province across a total of 26 provinces (Fig. 2D ) to infer and (see Supplementary Materials for details). Model predictions of arrival times in the 26 provinces fitted the actual data well (Fig. S2 ). We estimated that the date of the beginning of an exponential growth is December 20, 2019 (CI: December 11 to 26). This suggests that human infections in early December may be due to spillovers from the animal reservoir or limited chains of transmission (10, 11). The growth rate of the outbreak is estimated to be 0.29 per day (CI: 0.21 to 0.37 per day), a much higher rate than two recent estimates (1, 5) . This growth rate corresponds to a doubling time of 2.4 days. We further estimated that the total infected population size in Wuhan was approximately 4,100 (CI: 2,423 to 6,178) on January 18, which is remarkably consistent with a recently posted estimate (7). The estimated number of infected individuals is 18,700 (CI: 7,147, 38,663) on January 23, i.e. the date when Wuhan started lock down. We projected that without any control measure, the infected population would be approximately 233,400 (CI: 38,757 to 778,278) by the end of January (Fig. S3 ). An alternative model, the 'case count' approach, used daily case count data between January 19 and 26 from provinces outside of Hubei to infer the initiation and the growth rate of the outbreak. We restricted the data to this period because during this time infected persons found outside of Hubei province generally reported visiting Wuhan within 14 days of becoming symptomatic, i.e. cases during that time period were indicative of the dynamics in Wuhan. We developed a metapopulation model based on the classical SEIR model (12) . We assumed a deterministic exponential growth for the infected populations in Wuhan, whereas in other provinces, we represented the trajectory of infected individuals who travelled from Wuhan using a stochastic agent-based model. The transitions of the infected individuals from symptom onset to hospitalization and then to case confirmation were assumed to follow the distributions inferred from the case report data (see Supplementary Materials for detail). Simulation of the model using best fit parameters showed that the model described the observed case counts over time well (Fig. 2E) . The estimated date of exponential growth initiation is December 16, 2019 (CI: December 12 to Dec 21) and the exponential growth rate is 0.30 per day (CI: 0.26 to 0.34 per day). These estimates are consistent with estimates in the 'first arrival' approach ( Fig. 2F and G, and Fig. S4 ). We note that in both approaches, we assumed perfect detection of infected cases outside of Hubei province, i.e. the dates of first arrival and the number of case counts are accurate. This could be a reasonable assumption to make for symptomatic individuals because of the intensive surveillance implemented in China, for example, tracking individuals' movement from digital transportation data (6) . However, it is possible that a fraction of infected individuals, for example, individuals with mild or no symptoms (13), were not hospitalized, in which case we will underestimate the true size of the infected population in Wuhan. We undertook sensitivity analyses to investigate how our current estimates are affected by this issue using both approaches (see Supplementary Materials for detail). We found that if a proportion of cases remained undetected, the time of exponential initiation would be earlier than December 20, translating into a larger population of infected individuals in January, but the estimation of the growth rate remained the same. Overall, the convergence of the estimates of the exponential growth rate from the two approaches emphasizes the robustness of our estimates to modeldependent assumptions. Our estimated outbreak growth rate is significantly higher than two recent reports where the growth rate was estimated to be 0.1 per day (1, 5) . This estimate were based on early case counts from Wuhan (1) or international air travel data (5) . However, these data suffer from important limitations. The reported case counts in Wuhan during early outbreak are likely to be underreported because of many factors, and because of the low numbers of individuals traveling abroad compared to the total population size in Wuhan, inference of the infected population size and outbreak growth rate from infected cases outside of China suffers from large uncertainty (7, 8). Our estimated exponential growth rate, 0.29/day (a doubling time of 2.4 days) is consistent the rapidly growing outbreak during late January (Fig. 1A ). Using the exponential growth rate, we next estimated the range of the basic reproductive number, R 0 . It has been shown that this estimation depends on the distributions of the latent period (defined as the period between the times when an individual infected and become infectious) and the infectious period (14) . For both periods, we assumed a gamma distribution and varied the mean and the shape parameter of the gamma distributions in a large range to reflect the uncertainties in these distributions (see Supplementary Materials). It is not clear when an individual becomes infectious; thus, we considered two scenarios: 1) the latent period is the same as the incubation period, and 2) the latent period is 2 days shorter than the incubation period, i.e. individuals start to transmit 2 days before symptom onset. Integrating uncertainties in the exponential growth rate estimated from the 'first arrival' approach and the uncertainties in the duration of latent and infectious periods, we estimated the values of R 0 to be 6.3 (CI: 3.3 to 11.3) and 4.7 (CI: 2.8 to 7.6), for the first and second scenarios, respectively (Fig. 3A) . When using the estimates from the 'case count' approach, we estimated slightly higher R 0 values of 6.6 (CI: 4.0 to 10.5) and 4.9 (CI: 3.3 to 7.2), for the first and second scenarios, respectively (Fig. S5) . Overall, we report R 0 values are likely be between 4.7 and 6.6 with a CI between 2.8 to 11.3. We argue that the high R 0 and a relatively short incubation period lead to the extremely rapid growth of the of 2019-nCoV outbreak as compared to the 2003 SARS epidemic where R 0 was estimated to be between 2.2 to 3.6 (15, 16) . The high R 0 values we estimated have important implications for disease control. For example, basic theory predicts that the force of infection has to be reduced by 1 to guarantee extinction of the disease. At 2.2 this fraction is only 55%, but at 6.7 this fraction rises to 85%. To translate this into meaningful predictions, we use the framework proposed by Lipsitch et al (16) with the parameters we estimated for 2019-nCoV. Importantly, given the recent report of transmission of the virus from asymptomatic individuals (13), we considered the existence of a fraction of infected individuals who is asymptomatic and can transmit the virus (see Supplementary Materials). Results show that if as low as 20% of infected persons are asymptomatic and can transmit the virus, then even 95% quarantine efficacy will not be able to contain the virus (Fig. 3B ). Given the rapid rate of spread, the sensitivity of control effort effectiveness to asymptomatic infections and the potential of transmission before symptom onset, we need to be aware of the difficulty of controlling 2019-nCoV once it establishes in a new population (17) . Future field, laboratory and modeling studies aimed to address the unknowns, such as the fraction of asymptomatic individuals, the time when individuals become infectious and the existence of superspreaders are needed to accurately predict the impact of various control strategies (9, 17). Fortunately, we see evidence that control efforts have a measurable effect on the rate of spread. Since January 23, Wuhan and other cities in Hubei province implemented vigorous control measures, such as closing down transportation and mass gatherings in the city; whereas, other provinces also escalated the public health alert level and implemented strong control measures. We noted that the growth rate of the daily number of new cases in provinces outside of Hubei slowed down gradually since late January (Fig. 3B ). Due to the closure of Wuhan (and other cities in Hubei), the number of cases reported in other provinces during this period shall start to track local infection dynamics rather than imports from Wuhan. We estimated that the exponential growth rate is decreased to 0.14 per day (CI: 0.12 to 0.15 per day) since January 30. Based on this growth rate and an R 0 between 4.7 to 6.6 before the control measures, a calculation following the formula in Ref. (14) suggested that a growth rate decreasing from 0.29 per day to 0.14 per day translates to a 50%-59% decrease in R 0 to between 2.3 to 3.0. This is in agreement with previous estimates of the impact of effective social distancing during 1918 influenza pandemic (18) . Thus, the reduction in growth rate may reflect the impact of vigorous control measures implemented and individual behavior changes in China during the course of the outbreak. The 2019-nCoV epidemic is still rapidly growing and spread to more than 20 countries as of February 5, 2020. Here, we estimated the growth rate of the early outbreak in Wuhan to be 0.29 per day (a doubling time of 2.4 days), and the reproductive number, R 0 , to be between 4.7 to 6.6 (CI: 2.8 to 11.3). Among many factors, the Lunar New Year Travel rush in early and mid-January 2020 may or may not play a role in the high outbreak growth rate, although SARS epidemic also overlapped with the Lunar New Year Travel rush. How contiguous the 2019-nCoV is in other countries remains to be seen. If the value of R 0 is as high in other countries, our results suggest that active and strong population-wide social distancing efforts, such as closing down transportation system, schools, discouraging travel, etc., might be needed to reduce the overall contacts to contain the spread of the virus. N. Imai et al., Supplementary Text Figs. S1 to S5 Tables S1 to S3 We collected and translated reports from documents published daily from the China CDC website and official websites of health commissions across provinces and special municipalities in China (website URLs are available upon request). We collected daily counts of confirmed cases in each province as well as 140 individual case reports (Table S1 ). Many of the individual reports were also published on the China CDC official website (http://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/) and the China CDC weekly bulletin (in English) (http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm). Our dataset includes demographic information including age, sex as well as epidemiological information including dates of symptom onset, hospitalization, case confirmation, discharge or death. Most of the health commissions in provinces and special municipalities documented and published detailed information of the first or the first few cases confirmed with 2019-nCoV infection. As a result, this dataset includes case reports of many of the first or the first few individuals who were confirmed with the virus infection in each province, where dates of departure from Wuhan were available. We used the Baidu® Migration server (https://qianxi.baidu.com/) to estimate the number of daily travelers in and out Wuhan (Table S2) . Specifically, we extracted from the server the Immigration Index and Emigration Index for Wuhan, which are linearly related to the number of travelers going in and out of Wuhan, respectively, based on cell phone positioning data. We also extracted the fraction of individuals who went to or came from a particular province. It has been reported that there were 5 million people going out of Wuhan between the start of the Chinese New Year travel rush and January 23 (https://www.washingtonpost.com/world/asia_pacific/china-coronavirus-liveupdates/2020/01/30/1da6ea52-4302-11ea-b5fc-eefa848cde99_story.html; accessed Feb. 2, 2020). This allowed us to calibrating the Emigration Index and estimated the number of daily travelers to or from a particular province, and thus the fraction of people traveling to or from a particular province (Table S3) . These data were used in mathematical models to estimate the s We used the first confirmed cases in provinces other than Hubei to inform the time between patient infection and the onset of symptoms ( = 24). These individuals had all traveled to Wuhan a short time preceding symptoms onset. Since these individuals were the first cases detected in the province, it is likely that the infection occurred during their recent stay in Wuhan. We approximated the time of infection as the middle time point of their stay. Because the delays between infection and symptoms onset vary between patients, we modeled the delay using a gamma distribution, as its support is nonnegative and it permits relatively large delays as compared to the median. Figure 1 in the main text presents results from fitting the distribution to the data. The fitting procedure was performed by maximizing the likelihood of observed delays between infection and symptoms onset. For a single observation, the individual likelihood is the gamma density function evaluated at the infection-to-onset delay. Some of the delays were censored, i.e. bounded by a certain value. For example, in some cases, only the times of infection and hospitalization were reported, and the time of symptom onset was missing in the case report. In such cases, we assumed that the missing onset time is bounded between times of infection and hospitalization. Then, the likelihood for this observation is equal to the cumulative gamma distribution evaluated at this censored value, i.e., the time when the patient was hospitalized. The maximum likelihood estimates (MLEs) are the shape and scale parameters that maximize the sum over all observations of the individual log-likelihoods. We used differential_evolution in scipy.optimize library (Python) to perform maximization. A stochastic algorithm was implemented in the optimization procedure to avoid being trapped in local minima. (1) The likelihood-based confidence intervals was computed by methods reported in Raue et al. (2) A similar approach was adopted to fit distributions to the time between symptom onset and hospitalization ( = 96 ), between hospitalization and discharge ( = 6 ), and between hospitalization and death ( = 23). The reported dates for these events was obtained directly from official sources. Data from cases originating from all over China and neighboring countries were used for distribution fitting. Detailed patient-level data is provided in Table S1 . In this model, we used the first-arrival time of a patient who traveled from Wuhan to a specific other province and was later confirmed to have been infected by the 2019-nCoV. The rationale behind our approach is that an increasing fraction of people infected in Wuhan increases the likelihood that one such case is exported to the other provinces. Hence, how soon new cases are observed in other provinces can inform the disease progression in Wuhan. We hypothesize that this information is more reliable because the infected population in Wuhan needs to sufficiently large to allow probable export of one infected individual. The flow of expected cases depends on the flow of travelers to each province and on the proportion of the Wuhan population that is infected by the virus. We first estimated the daily number of travelers from Wuhan to each of the China provinces. For this purpose, we used Wuhan's daily migration index to other provinces and the daily distribution of traveler destinations from Wuhan (see Data Collection). When assuming linearity between the migration index and the total number of exported individuals, it can be estimated that a migration index of 1 is approximately equal to 5 million individuals over the sum of migration indexes from January 10 to January 25, 2020 (it was reported that 5 million individuals left Wuhan during that period; see Data Collection section). The total number of daily Wuhan travelers to a province at a certain date was then set equal to the number of travelers estimated from the migration index times the fraction of the population having traveled to this province. Results from estimation are reported in Table S2 . An infected traveler may be pre-symptomatic, i.e. this individual may have been exposed to the virus ( ) and not have developed symptoms or be already symptomatic ( ). In fact, for many individuals, infection onset was recorded days after the time of their departure from Wuhan (see Table S1 ). Assuming travelers represent a random sample of the whole population, it follows that the probability that a traveler is infected is equal to the number of exposed or infected individuals in Wuhan ( * = + ) over the total Wuhan population ( ( )). The total population size varied during the infection period. We estimated the population size by using the daily inflow and outflow of individuals from Wuhan (see Table S2 ). In order to represent the beginning of an outbreak, we modeled an exponential increase in the size of exposed and infected population over time : where is the infection growth rate and 0 is the time of onset of exponential outbreak. Equation (1) allows a simple analytic expression of the likelihood of arrival times for the first cases in each of the provinces other than Hubei. For a specific province, indexed by , we modeled the arrival of new cases in each province during short time intervals as a Poisson random process ( ) . Note that the rate parameter of this Poisson process, ( ) = * ( ) ( )/ ( ) depends on the time-varying sum of exposed and symptomatic populations * ( ), the time varying flow of population ( ) transported from Wuhan to the province and the time varying population size. It can be shown mathematically (3) that the probability that no exposed or symptomatic traveler arrived to province during a short time interval ( , + Δ ) , Δ ≪ 1 is: We assume no delay was incurred due transportation in our model. Equation (222) is valid for any > 0, and because the overall process is Markovian, we can formulate the probability that the time of arrival of the first case in province , ( ) , is later than by: where [ 0 , ) was partitioned into equal intervals of Δ = ( − 0 )/ , and we convert the Riemannian sum into an integral in the limit of → ∞ (Δ → 0). Finally, we apply d/d to 1 − ℙ{ ( ) > } to obtain the probability density function (PDF) of the first-arrival time of province : The form of the probability density function Eq. (4) was used to estimate the likelihood of observed arrival times in each province as a function of the growth rate and outbreak initiation time 0 . This likelihood was maximized, again using differential_evolution in scipy.optimize,(1) and the confidence intervals for and 0 were obtained through profile likelihood. The arrival times were fitted using three versions of the above model. Each version made a different assumption on the probability that an infected or exposed individual having arrived at a location be later diagnosed with coronavirus. In the first sensitivity analysis, we assumed that this probability was 50%. In the second analysis, we assumed this probability to be 10%. Finally, we tested the assumption that this probability was 0% for cases having arrived before Dec 31 st , 2019, after which point new infected arrivals had a 50% probability of being later diagnosed. The model formulation above needed a small modification to perform analyses. The event : "no new arrival before time is later diagnosed with the infection" is now equivalent to "no arrival of an infected individual before time ", "one infected arrival before time remained undiagnosed", "two infected arrivals before time remained undiagnosed", etc. For a Poisson process with fixed parameter , the probability of can be expressed as: where is the probability of detection. It follows that the modified PDF formulation for sensitivity analyses is: This PDF was used instead of equation (4) to obtain maximum likelihood estimates of the growth rate and outbreak initiation date for sensitivity analyses. The following are the maximum likelihood estimates for the growth rate and date of outbreak initiation in the hypothetical situations mentioned above. When the probability of detection of a case was set to 50%, the estimated growth rate was 0.29/day, while the time of outbreak initiation was Dec 18, 2019. The same estimates were obtained if we assumed no case could be detected for individuals having arrived from Wuhan before Dec 31, 2019. When the probability of detection of a case was set to 10%, the estimated growth rate remained 0.29/day, but the estimated outbreak initiation date was Dec 12, 2019. Model 1 fitted the time of arrival of the first confirmed case of each province. We used a different approach and a different dataset to infer disease dynamics. In particular, we constructed a hybrid stochastic model for inferring the disease dynamics using all confirmed cases outside Hubei. Since the measurements in Wuhan, Hubei may have been biased in early outbreak, it is our aim to use data from outside Hubei for the inference of the growth rate and the onset time (define = 0 as 0:00 am, 1/1/2020), defined as the time when the sum of exposed and symptomatic populations ≈ 1 in Wuhan. The model is hybrid in the sense that we will couple a deterministic and exponential growth to describe the outbreak in Wuhan and an agent-based model which describes the discrete population dynamics of the patients after they left Hubei to other provinces. We present a schematic diagram of the hybrid meta-population model in Supplementary Fig. 6 below. We assume an exponential growth of the number of exposed ( , for Wuhan) and symptomatic ( ) populations in Wuhan over time, ( ) = (0) and ( ) = (0) from the onset. The overall growth rate is dominated by the largest eigenvalue of a sequential compound process, and given an value, the ratio ≔ (0)/ (0) is asymptotic constant (4). Thus, given a growth rate parameter and an initial condition ( 0 ) + ( 0 ) = 1, we numerically compute the exposed population ( ) = ( ) (1 + ( )) −1 exp( ( − 0 )) and the symptomatic population ( ) = (1 + ( )) −1 exp( ( − 0 )). We assume that between 1/1 and 1/26, the populations in Wuhan are large and the dynamics can be reasonably approximate by the above deterministic and exponentially growing curves. However, the initial propagation of the disease to other provinces in China involves only a small population of exposed ( , for Others) or symptomatic individuals who left Hubei province. In addition, the transitions between different phases of these patients, from exposed ( ) to symptomatic ( ), over to hospitalized ( ), and finally to be confirmed by laboratory examinations ( ) in other provinces are also variable (as we quantified in Fig. 1C-F) . Consequently, the resulting population dynamics in other provinces is highly stochastic. We thus adopt an agent-based modeling approach and rely on kinetic Monte Carlo Sampling techniques detailed below to simulate the population dynamics in other provinces. With this approach, we aim to generate samples of (1) each individual patient who left Wuhan at a specific date, and (2) the individual's health status as the time progresses (susceptible, exposed, or symptomatic). The goal is to accumulate a large amount of Monte Carlo samples, by which we can compute the key summary statistics, i.e., the average case reported on each day between 1/18 and 1/26, to be compared against to the data. We achieve this by the following algorithmic procedures. index which quantifies the fraction of total populations (14 million) in Wuhan that traveled to other provinces on each date = 1, … ,26 (see Table S3 ). Assuming independence of an individual's health state (susceptible, exposed, or symptomatic) and the individual's migration decision (leaving to other provinces or not), on each date , the exposed and symptomatic populations leaving Hubei can be modeled by two Bernoulli distributions, = Bernoulli( ( ), ( )) and = Bernoulli( ( ), ( )). Here, ( ) and ( ) are the exposed and symptomatic population in Wuhan, and are assigned to the nearest integers to the previously prescribed exponential growth, given model parameters ( , 0 ) . Thus, to generate one stochastic sample path (realization), we generate Bernoulli-distributed random populations leaving Hubei on each day between 1/1 and 1/26 (both included), and model each of these in silico patients' health states by the following procedures. Generate the progression of the health state for each patient: We assume that each hypothetical patient generated by the above procedure would stochastically, identically and independently progress toward to be confirmed ( ) and reported in one of the other provinces. If an individual was exposed ( ) when s/he left Hubei at , we generate a Gamma distributed random time Δ → ∼ Γ( 1 , 1 ) and update the individual's health state to symptomatic ( ) at time + Δ → . We chose a time-dependent waiting-time distribution for the progression from symptomatic sate to reflect the two regimes we observed from the data (see main text): If + Δ → is before 1/18 (included), we generate a Gamma distributed random time Δ → ∼ Γ( 2,1 , 2,1 ) to model the waiting time for an infected patient to be hospitalized (otherwise, if it is later than 1/18, Δ → ∼ Γ( 2,2 , 2,2 ) ). Consequently, the patient's state is changed to at time + Δ → + Δ → . If + Δ → + Δ → is before 1/19, the patient would wait in the "H" state until 1/19 when the policy of case confirmation was announced and institutionalized. Then, the confirmation process is modelled by another Gamma distributed random time Δ → ∼ Γ( 3 , 3 ). The patient is then confirmed and reported at time + Δ → + Δ → + Δt → , and we add one more case report at the next integer (date of January). Similar procedure applied to a patient who had already progressed to the state before s/he left Hubei on date , with the exception that the first random waiting time is neglected-the patient's confirmation time would be + Δ → + Δ → . We repeat the procedure for each in-silico patient who left . In Wuhan, a susceptible patient in compartment is first exposed and progresses to an exposed state ( ), progressed to be infected ( ), hospitalized ( ), and then became a confirmed case ( ), and either recovered ( ) or deceased ( ). A portion of ill population ( and ) moved to other provinces and followed a similar progression. Because these populations are small and thus the dynamics are stochastic, we adopt an agent based approach to simulate the disease dynamics ( ( ) , ( ) , ( ) and ( ) ) in other provinces. The case reports on each day in other provinces were compared against the model's output, ( ) to constrain the unknown initial onset and growth rate in Wuhan. Wuhan between 1/1 and 1/26 (both included), and register the time when these patients were reported between 1/18 and 1/26 (both included). It is our task to infer the unknown parameters, exponential growth rate and exponential growth onset time 0 by the number of confirmed cases reported between 1/18 and 1/26. This is possible because the information of the unknown parameters ( , 0 ) have an impact of the deterministic growths of the exposed ( ) and symptomatic population ( ), which in turn have an impact on the random populations which have left Hubei on each date. These populations follow statistically quantified processes until the final confirmation outside of Hubei, and can be compared against the reported data. An error measure is devised to assess the quality of fit of the model given a set of parameters ( , 0 ) by the following procedures. For each parameter set, we generate 2 13 = 8192 Monte Carlo samples. On each date , the th sample reports a random number ( _ | , 0 , ) of confirmed new cases. We thus average over all the samples and obtain an averaged number of newly confirmed cases on a date , ( | , 0 ) ≔ ∑ ( | , 0 , ) , and compare it to the actual data ( ). We quantify the quality of the fit by computing the sum of the squared residuals: A 100 × 100 grid-based parameter scan is performed to identify the parameters in the region 0.22 < < 0.42 and −20 ≤ 0 ≤ −5 for identifying the best-fit parameters: As for uncertainty quantification, we formulate the logarithm of the likelihood ℒ of a parameter set ( , 0 ) as Here, = 9 is the number of data points we use to fit the model. The assumption we make to formulate the above likelihood is that (1) the data (number reported new cases on date ) is normally distributed with a mean which equals to the Monte Carlo mean reported new cases in our model, and (2) the variance of the noise is identically and -independently distributed, and the variance is equal to the mean squared residuals of the best-fit model. We can then formulate a likelihood ratio test, which quantifies how likely a set of parameters ( , 0 ) is in comparison to the best-fit parameters ( * , 0 * ): In Bayesian inference, what we computed is essentially the joint posterior distribution of the model parameters ( , 0 ), provided a uniform prior distribution on the region of our interests. We present this joint distribution in Supplementary Fig. S4 . Finally, because the joint posterior is (7) * , 0 * ≔ argmin { , 0 } 2 ( , 0 ). log ℒ( , ) ≔ − 2 ( , 0 ) 2 ( * , 0 * ) . (9) narrowly distributed, we can numerically compute the marginalized posterior, which is reported in Fig. 2D -F and used to calculate the bounds of centered 95% probability mass to estimate the confidence interval of the growth rate . Assuming gamma distributions for the latent and infectious periods, Wearing et al. (4) have shown that the value of R0 can be calculated from estimated exponential growth rate, r, of an outbreak as: where 1/ and 1/ are the mean latent and infectious periods, respectively, and and are the shape parameters for the gamma distributions for the mean latent and infectious periods, respectively. To quantify the uncertainty of 0 , we assume the parameters ( , , , , ) are mutually independent and we generate random samples to compute the resulting 0 . Specifically, we generate the samples according to 1. ∼ ℙ{ | }, i.e., we resample the posterior distribution from Eq. (11), 2. = 4.5, 3. ∼ Unif(1,6), 4. 1/ ∼ Unif (2, 8) in the first scenario, and Unif(4,10) in the second scenario. 5. ∼ ( = 1/4.2, = 0.0245) in the first scenario, and ( = 1/2.2, = 0.0468). we generate 10 5 parameters and compute their respective 0 using Eq. (12) . The resulting evaluation were binned into 40 bins to generate histograms. We used the 97.5% and 2.5% percentile of the generate data to quantify the 95% confidence interval. Using a susceptible-exposed (noninfectious)-infectious-recovered (SEIR) type compartmental model, Lipsitch et al. (5) evaluated the impact of quarantine of symptomatic cases to prevent further transmission and quarantine and close observation of asymptomatic contacts of cases so that they may be isolated when they show possible signs of the disease. Assuming that only symptomatic individuals transmit the pathogen, they showed that the reproductive number after the intervention, , can be expressed as: where is the reproductive number before intervention, is the percentage of infected individuals being quarantined, and are the mean durations of infectious period after intervention and without intervention, respectively. Here in our model, we adopted this formulation; however, we assumed that a fraction, , of infected individuals are asymptomatic and can transmit. In this case, quarantine of symptomatic individuals only reduces the contribution of these individuals towards the reproductive number. Thus, we can calculate the reproductive number under quarantine, , as: We also considered another form of control measure, i.e. the population-level control measure that reduces overall number of daily contacts in the population by . These measures include closing down of transportation systems, work and/or school closure, etc. Since R depends linearly on the number of daily contacts, we calculate the combined impact of the individual-based quarantine and the population level control measure as: In our calculations, we assumed that the mean duration of infectious period of 2019-nCoV to be 5 days, i.e. =5 days and that = 2 days. We set the value of to be the maximum likelihood estimate of 0 . Then the impact of the two types of interventions are calculated. To infer the growth rate of the number of new cases, we used linear regression over the logtransformed case counts. We used the day in January 2020 as an independent variable. For this specific analysis, we avoided using case frequencies < 10 because infection dynamics may have been dominated by stochasticity. For cases inside Hubei, we used the number of cases reported between Jan. 16 and Feb. 4 . For cases outside of Hubei, we used the number of cases reported between Jan 20. and Feb. 4. To assess whether a different growth rate was observed after Jan 25 outside of Hubei, we evaluated the significance of the interaction term between variable day and the index variable for dates Jan 25 and beyond; the results are presented in Fig. 3C . All regressions and confidence interval estimates were obtained through software R. (15) Figure S1 . The duration from symptom onset to hospitalization decreases over time during the outbreak. Figure S2 . Predictions of the 'first arrival' model using best-fit parameters agree well with data. Probability densities of times of first arrival of infected cases in each province based on our maximum likelihood estimate (curves) and documented times of first arrival of infected individuals in our case report dataset (lines). Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV) Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Railway corporation using big data to trace potential virus carrier Pathways to zoonotic spillover Infectious Diseases of Humans: Dynamics and Control. Oxford science publications Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany Appropriate models for the management of infectious diseases Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong Transmission dynamics and control of severe acute respiratory syndrome Factors that make an infectious disease outbreak controllable The effect of public health measures on the 1918 influenza pandemic in U.S. cities Differential Evolution -a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood Analysis of survival data Appropriate Models for the Management of Infectious Diseases Transmission dynamics and control of severe acute respiratory syndrome We would like to thank Alan Perelson and Christiaan van Dorp for Jilin 407 200 282 393 489 450 499 571 817 833 560 539 554 Heilongjiang 650 339 501 639 706 663 748 970 1111 1268 914 976 997 Sichuan 2789 1279 2131 2433 3096 3174 3391 4195 4934 5979 4362 4308 4264 Fujian 1961 940 1614 2064 2281 1989 2269 2568 3300 3696 2888 2928 2575 Liaoning 943 508 674 786 1032 1018 1197 1427 1405 1558 1267 1212 1025 Shanxi 1314 632 1034 1155 1602 1208 1596 2055 2712 3262 2446 2255 2132 Guizhou 2237 986 1551 2187 2553 2416 2917 3310 4248 4494 3006 2558 2409 Anhui 5200 2988 4043 5554 5812 5447 5684 6649 7908 8842 7427 7606 5871 Shandong 2825 1294 2116 2630 3123 3008 3515 4309 4738 5726 4598 3770 3655 Yunnan 1643 832 1254 1622 1738 1705 2119 2254 2843 2718 2181 2188 1911 Henan 10736 6362 7726 11182 11949 11416 12092 14269 16437 19243 15414 16559 13652 Qinghai 233 77 172 172 272 237 349 400 523 399 383 236 360 Guangxi 1915 801 1457 1892 2281 2108 2194 2854 3823 4566 3448 3298 3323 Ningxia 266 92 219 197 353 332 324 371 556 797 501 471 443 Hainan 1248 693 1050 1032 1276 1232 1521 1883 1765 1921 1356 1212 1274 Beijing 4083 1972 2977 3883 3829 5234 5260 5536 5980 5400 4804 3904 3351 Province Date in January 2020 Before Jan 1st Table S3. Continued 13 14 15 16 17 18 19 20 21 22 23 24 25 Shanghai 2872 2528 2394 2457 2294 2059 1847 1702 1578 1476 1537 918 284 Tianjin 564 413 579 534 430 481 462 407 430 369 298 156 46 Chongqing 3411 3476 4052 3873 3785 4359 4256 4698 5977 5482 4959 2234 532 Guangdong 6283 5299 5078 5209 5304 5561 5839 6141 8081 8223 7687 5022 Heilongjiang 872 802 868 908 946 961 1056 999 1100 1054 893 416 150 Sichuan 3872 3695 3710 3846 3670 4050 4421 4476 5403 5113 4116 2078 677 Fujian 2616 2455 2631 2858 2982 3158 3332 3330 4016 3901 3571 1507 394 Liaoning 1077 997 947 1122 1118 1167 1254 1258 1387 1107 843 329 133 Shanxi 2051 1677 1710 1362 1634 1819 1880 2035 2486 2530 2331 918 226 Guizhou 2513 1945 1552 1576 1462 1545 1550 1665 1817 1581 1438 641 232 Anhui 5949 5858 6657 6491 7225 8272 7489 8398 10854 11069 9522 3412 1048 Shandong 3334 3063 3157 3312 3584 3879 3728 3811 4781 4480 3422 1593 492 Yunnan 2051 1580 1447 1549 1548 1545 1616 1813 2199 2056 1637 745 313 Henan 13694 14026 15866 17256 18263 19222 19332 23011 29549 29887 26384 8624 2506 Qinghai 256 243 184 160 115 137 99 111 96 158 99 69 29 Guangxi 3077 2455 2263 2164 2236 2059 2507 2664 2964 2899 2579 1143 411 Ningxia 385 267 210 240 143 240 264 222 239 211 99 87 35 Hainan 1026 948 1000 1122 1089 1098 1287 1369 1721 1739 1587 866 394 Beijing 3693 3452 3420 3419 3068 2506 2243 2072 2247 2056 1785 675 243