key: cord-339789-151d1j4n
authors: Hong, Hyokyoung G.; Li, Yi
title: Estimation of time-varying reproduction numbers underlying epidemiological processes: A new statistical tool for the COVID-19 pandemic
date: 2020-07-21
journal: PLoS One
DOI: 10.1371/journal.pone.0236464
sha: 
doc_id: 339789
cord_uid: 151d1j4n

The coronavirus pandemic has rapidly evolved into an unprecedented crisis. The susceptible-infectious-removed (SIR) model and its variants have been used for modeling the pandemic. However, time-independent parameters in the classical models may not capture the dynamic transmission and removal processes, governed by virus containment strategies taken at various phases of the epidemic. Moreover, few models account for possible inaccuracies of the reported cases. We propose a Poisson model with time-dependent transmission and removal rates to account for possible random errors in reporting and estimate a time-dependent disease reproduction number, which may reflect the effectiveness of virus control strategies. We apply our method to study the pandemic in several severely impacted countries, and analyze and forecast the evolving spread of the coronavirus. We have developed an interactive web application to facilitate readers’ use of our method.

A recent work [35] demonstrated that R 0 is likely to vary "due to the impact of the performed intervention strategies and behavioral changes in the population".

The merits of our work are summarized as follows. First, unlike the deterministic ODEbased SIR models, our method does not require transmission and removal rates to be known, but estimates them using the data. Second, we allow these rates to be time-varying. Some timevarying SIR approaches [27] directly integrate into the model the information on when governments enforced, for example, quarantine, social-distancing, compulsory mask-wearing and city lockdowns. Our method differs by computing a time-varying R 0 , which gauges the status of coronavirus containment and assesses the effectiveness of virus control strategies. Third, our Poisson model accounts for possible random errors in reporting, and quantifies the uncertainty of the predicted numbers of susceptible, infectious and removed. Finally, we apply our method to analyze the data collected from the aforementioned GitHub time-series data repository. We have created an interactive web application (https://younghhk.shinyapps.io/ tvSIRforCOVID19/) to facilitate users' application of the proposed method.

We introduce a Poisson model with time-varying transmission and removal rates, denoted by β(t) and γ(t). Consider a population with N individuals, and denote by S(t), I(t), R(t) the true but unknown numbers of susceptible, infectious and removed, respectively, at time t, and by s (t) = S(t)/N, i(t) = I(t)/N, r(t) = R(t)/N the fractions of these compartments.

The following ordinary differential equations (ODE) describe the change rates of s(t), i(t) and r(t):

with an initial condition: i(0) = i 0 and r(0) = r 0 , where i 0 > 0 in order to let the epidemic develop [36] . Here, β(t) > 0 is the time-varying transmission rate of an infection at time t, which is the number of infectious contacts that result in infections per unit time, and γ(t) > 0 is the time-varying removal rate at t, at which infectious subjects are removed from being infectious due to death or recovery [33] . Moreover, γ −1 (t) can be interpreted as the infectious duration of an infection caught at time t [37] . From (1)-(3), we derive an important quantity, which is the time-dependent reproduction number R 0 ðtÞ ¼ bðtÞ gðtÞ :

Time-varying SIR based Poisson model for COVID -19 To see this, dividing (2) by (3) leads to

where (di/dr)(t) is the ratio of the change rate of i(t) to that of r(t). Therefore, compared to its time-independent counterpart, R 0 ðtÞ is an instantaneous reproduction number and provides a real-time picture of an outbreak. For example, at the onset of the outbreak and in the absence of any containment actions, we may see a rapid ramp-up of cases compared to those removed, leading to a large (di/dr)(t) in (4), and hence a large R 0 ðtÞ. With the implemented policies for disease mitigation, we will see a drastically decreasing (di/dr)(t) and, therefore, declining of R 0 ðtÞ over time. The turning point is t 0 such that R 0 ðt 0 Þ ¼ 1; when the outbreak is controlled with (di/dr)(t 0 ) < 0. Under the fixed population size assumption, i.e., s(t) + i(t)+ r(t) = 1, we only need to study i (t) and r(t), and re-express (1)-(3) as

with the same initial condition.

As the numbers of cases and removed are reported on a daily basis, t is measured in days, e.g. t = 1, . . ., T. Replacing derivatives in (5) with finite differences, we can consider a discrete version of (5):

iðt þ 1Þ À iðtÞ ¼ bðtÞiðtÞf1 À iðtÞ À rðtÞg À gðtÞiðtÞ; rðt þ 1Þ À rðtÞ ¼ gðtÞiðtÞ;

where β(t) and γ(t) are positive functions of t. We set i(0) = i 0 and r(0) = r 0 with t = 0 being the starting date. Model (6) admits a recursive way to compute i(t) and r(t):

iðt þ 1Þ ¼ f1 þ bðtÞ À gðtÞgiðtÞ À bðtÞiðtÞfiðtÞ þ rðtÞg;

for t = 0, . . ., T − 1. The first equation of (7) implies that β(t) < γ(t) or R 0 ðtÞ ¼ bðtÞg À 1 ðtÞ < 1 leads to that i(t + 1) < i(t) or the number of infectious cases drops, meaning the spread of virus is controlled; otherwise, the number of infectious cases will keep increasing.

To fit the model and estimate the time-dependent parameters, we can use nonparametric techniques, such as splines [38] [39] [40] [41] [42] [43] , local polynomial regression [44] and reproducible kernel Hilbert space method [45] . In particular, we consider a cubic B-spline approximation [46] . Denote by B(t) = {B 1 (t),. . .,B q (t)} T the q cubic B-spline basis functions over [0, T] associated with the knots 0 = w 0 < w 1 < . . . < w q−2 < w q−1 = T. For added flexibility, we allow the number of knots to differ between β(t) and γ(t) and specify log bðtÞ ¼

When b 1 ¼ � � � ¼ b q 1 and g 1 ¼ � � � ¼ g q 2 , the model reduces to a constant SIR model [46] . We use cross-validation to choose q 1 and q 2 in our numerical experiments.

Denote by β ¼ ðb 1 ; . . . ; b q 1 Þ and γ ¼ ðg 1 ; . . . ; g q 2 Þ the unknown parameters, by Z I (t) and Z R (t) the reported numbers of infectious and removed, respectively, and by z I (t) = Z I (t)/N and z R (t) = Z R (t)/N, the reported proportions. Also, denote by I(t) and R(t) the true numbers of infectious and removed, respectively at time t. We propose a Poisson model to link Z I (t) and Z R (t) to I(t) and R(t) as follows:

We also assume that, given I(t) and R(t), the observed daily number {Z I (t), Z R (t)} are independent across t = 1, . . ., T, meaning the random reporting errors are "white" noise. We note that (9) is directly based on "true" numbers of infectious cases and removed cases derived from the discrete SIR model (6) . This differs from the Markov process approach, which is based on the past observations. With (6), (7) and (8), R(t) and I(t) are the functions of β and γ, since

Given the data (Z I (t), Z R (t)), t = 1, . . ., T, we obtain ðβ;ĝÞ, the estimates of (β, γ), by maximizing the following likelihood or, equivalently, maximizing the log likelihood function

where C is a constant free of β and γ. See the S1 Appendix for additional details of optimization.

We then estimate the variance-covariance matrix of ðβ;γÞ by inverting the second derivative of −ℓ(β, γ) evaluated at ðβ;γÞ. Finally, for t = 1, . . ., T, we estimate I(t) and R(t) byÎðtÞ ¼ NîðtÞ andRðtÞ ¼ NrðtÞ, whereîðtÞ andrðtÞ are obtained from (7) with all unknown quantities replaced by their estimates; estimate β(t) and γ(t) bybðtÞ andĝðtÞ, obtained by using (8) with (β, γ) replaced by ðβ;γÞ; and estimate R 0 ðtÞ byR 0 ðtÞ ¼βðtÞ=γðtÞ.

Estimation: Let N be the size of population of a given country. The date when the first case was reported is set to be the starting date with t = 1, i 0 = Z I (1)/N and r 0 = Z R (1)/N. The observed data are {Z I (t), Z R (t), t = 1, . . ., T}, obtained from the GitHub data repository website mentioned in the introduction. We maximize (10) to obtainβ ¼ ðb 0 ;b 1 ; . . . ;b q 1 Þ and γ ¼ ðĝ 0 ;ĝ 1 ; . . . ;ĝ q 2 Þ. The optimal q 1 and q 2 are obtained via cross-validation. We denote by 

Since the first case of COVID-19 was detected in China, it quickly spread to nearly every part of the world [6] . COVID-19, conjectured to be more contagious than the previous SARS and H1N1 [48] , has put great strain on healthcare systems worldwide, especially among the severely affected countries [49] . We apply our method to assess the epidemiological processes of COVID-19 in some severely impacted countries.

The country-specific time-series data of confirmed, recovered, and death cases were obtained from a GitHub data repository website (https://github.com/ulklc/covid19-timeseries). This site collects information from various sources listed below on a daily basis at GMT 0:00, converts the data to the CSV format, and conducts data normalization and harmonization if inconsistencies are found. The data sources include In particular, the current population size of each country, N, came from the website of WorldoMeter. Our analyses covered the periods between the date of the first reported coronavirus case in each nation and June 30, 2020. In the beginning of the outbreak, assessment of i 0 and r 0 was problematic as infectious but asymptomatic cases tended to be undetected due to lack of awareness and testing. To investigate how our method depends on the correct specification of the initial values r 0 and i 0 , we conducted Monte Carlo simulations. As a comparison, we also studied the performance of the deterministic SIR model in the same settings. Fig 1  shows that, when the initial value i 0 was mis-specified to be 5 times of the truth, the curves of i (t) and r(t) obtained by the deterministic SIR model (6) were considerably biased. On the other hand, our proposed model (9), by accounting for the randomness of the observed data, was robust toward the mis-specification of i 0 and r 0 : the estimates of r(t) and i(t) had negligible biases even with mis-specified initial values. In an omitted analysis, we mis-specified i 0 and r 0 to be only twice of the truth, and obtain the similar results.

Our numerical experiments also suggested that using the time series, starting from the date when both cases and removed were reported, may generate more reasonable estimates.

Using the cubic B-splines (8), we estimated the time-dependent transmission rate β(t) and removal rate γ(t), based on which we further estimated R 0 ðtÞ, I(t) and R(t). To choose the optimal number of knots for each country when implementing the spline approach, we used 5-fold cross-validation by minimizing the combined mean squared error for the estimated infectious and removed cases. Fig 2 shows sharp variations in transmission rates and removal rates across different time periods, indicating the time-varying nature of these rates. The estimated I(t) and R(t) overlapped well with the observed number of infectious and removed cases, indicating the reasonableness of the method. The pointwise 95% confidence intervals (in yellow) represent the uncertainty of the estimates, which may be due to error in reporting. Fig 3 presents the estimated time-varying reproduction number,bðtÞĝðtÞ À 1 , for several countries. The curves capture the evolving trends of the epidemic for each country. In the US, though the first confirmed case was reported on January 20, 2020, lack of immediate actions in the early stage let the epidemic spread widely. As a result, the US had seen soaring infectious cases, and R 0 ðtÞ reached its peak around mid-March. From mid-March to early April, the US tightened the virus control policy by suspending foreign travels and closing borders, and the federal government and most states issued mandatory or advisory stay-home orders, which seemed to have substantially contained the virus.

The high reproduction numbers with China, Italy, and Sweden at the onset of the pandemic imply that the spread of the infectious disease was not well controlled in its early phases. With the extremely stringent mitigation policies such as city lockdown and mandatory mask-wearing implemented in the end of January, China was reported to bring its epidemic under control with a quickly dropping R 0 ðtÞ in February. This indicates that China might have contained the epidemic, with more people removed from infectious status than those who became infectious.

Sweden is among the few countries that imposed more relaxed measures to control coronavirus and advocated herd immunity. The Swedish approach has initiated much debate. While some criticized that this may endanger the general population in a reckless way, some felt this might terminate the pandemic more effectively in the absence of vaccines [50] . Fig 3 demon- strates that Sweden has a large reproduction number, which however keeps decreasing. The "big V" shape of the reproduction number around May 1 might be due to the reporting errors or lags. Our investigation found that the reported number of infectious cases in that period suddenly dropped and then quickly rose back, which was unusual. Around February 18, a surge in South Korea was linked to a massive cluster of more than 5,000 cases [51] . The outbreak was clearly depicted in the time-varying R 0 ðtÞ curve. Since then, South Korea appeared to have slowed its epidemic, likely due to expansive testing programs and extensive efforts to trace and isolate patients and their contacts [52] . . Estimated I(t), R(t), β(t), γ(t) , and R 0 ðtÞ. The US (left) and China (right) are shown based on the data up to June 30, 2020. The blue dots and the red dashed curves represent the observed data and the model-based predictions, respectively, with 95% confidence interval. More broadly, Fig 3 categorizes countries into two groups. One group features the countries which have contained coronavirus. Countries, such as China and South Korea, took aggressive actions after the outbreak and presented sharper downward slopes. Some European countries such as Italy and Spain and Mideastern countries such as Iran, which were hit later than the East Asian countries, share a similar pattern, though with much flatter slopes. On the other hand, the US, Brazil, and Sweden are still struggling to contain the virus, with the R 0 ðtÞ curves hovering over 1. We also caution that, among the countries whose R 0 ðtÞ dropped below 1, the curves of the reproduction numbers are beginning to uptick, possibly due to the resumed economy activities.

We have developed a web application (https://younghhk.shinyapps.io/tvSIRforCOVID19/) to facilitate users' application of the proposed method to compute the time-varying reproduction number, and estimated and predict the daily numbers of active cases and removed cases for the presented countries and other countries; see Fig 4 for an illustration.

Our code was written in R [53] , using the bs function in the splines package for cubic B-spline approximation, the nlm function in the stats package for nonlinear minimization, and the jacobian function in the numDeriv package for computation of gradients and hessian matrices. Graphs were made by using the ggplot2 package. Our code can be found on the aforementioned shiny website. 

The rampaging pandemic of COVID-19 has called for developing proper computational and statistical tools to understand the trend of the spread of the disease and evaluate the efficacy of mitigation measures [54] [55] [56] [57] . We propose a Poisson model with time-dependent transmission and removal rates. Our model accommodates possible random errors and estimates a timedependent disease reproduction number, R 0 ðtÞ, which can serve as a metric for timely evaluating the effects of health policies.

There have been substantial issues, such as biases and lags, in reporting infectious cases, recovery, and deaths, especially at the early stage of the outbreak. As opposed to the deterministic SIR models that heavily rely on accurate reporting of initial infectious and removed cases, our model is more robust towards mis-specifications of such initial conditions. Applications of our method to study the epidemics in selected countries illustrate the results of the virus containment policies implemented in these countries, and may serve as the epidemiological benchmarks for the future preventive measures.

Several methodological questions need to be addressed. First, we analyzed each country separately, without considering the traffic flows among these countries. We will develop a joint model for the global epidemic, which accounts for the geographic locations of and the connectivity among the countries.

Second, incorporating timing of public health interventions such as the shelter-in-place order into the model might be interesting. However, we opted not to follow this approach as no such information exists for the majority countries. On the other hand, the impact of the interventions or the change point can be embedded into our nonparametric time-dependent estimates.

Third, the validity of the results of statistical models eventually hinges on the data transparency and accuracy. For example, the results of Chinazzi et al. [58] suggested that in China only one of four cases were detected and confirmed. Also, asymptomatic cases might have been undetected in many countries. All of these might have led to underestimation of the actual number of cases. Moreover, the collected data could be biased toward patients with severe infection and with insurance, as these patients were more likely to seek care or get tested. More in-depth research is warranted to address the issue selection bias.

Finally, our present work is within the SIR framework, where removed individuals include recovery and deaths, who hypothetically are unlikely to infect others. Although this makes the model simpler and widely adopted, the interpretation of the γ parameter is not straightforward. Our subsequent work is to develop a susceptible-infectious-recovered-deceased (SIRD) model, in which the number of deaths and the number of recovered are separately considered. We will report this elsewhere.

Containment of COVID-19 requires the concerted effort of health care workers, health policy makers as well as citizens. Measures, e.g. self-quarantine, social distancing, and shelter in place, have been executed at various phases by each country to prevent the community transmission. Timely and effective assessment of these actions constitutes a critical component of the effort. SIR models have been widely used to model this pandemic. However, constant transmission and removal rates may not capture the timely influences of these policies.

We propose a time-varying SIR Poisson model to assess the dynamic transmission patterns of COVID-19. With the virus containment measures taken at various time points, R 0 may vary substantially over time. Our model provides a systematic and daily updatable tool to evaluate the immediate outcomes of these actions. It is likely that the pandemic is ending and many countries are now shifting gear to reopen the economy, while preparing to battle the second wave of virus attack [59, 60] . Our tool may shed light on and aid the implementation of future containment strategies. 

Coronaviruses: an overview of their replication and pathogenesis

Bats are natural reservoirs of SARS-like coronaviruses

Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus

Human coronavirus and severe acute respiratory infection in Southern Brazil. Pathogens and Global Health

Evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease

Johns Hopkins Cornonavirus Resource Center

Real-time epidemic forecasting for pandemic influenza

Mathematical models of infectious disease transmission

Modelling transmission and control of the COVID-19 pandemic in Australia

Challenges in control of Covid-19: short doubling time and long delay to effect of interventions

Individual vaccination as Nash equilibrium in a SIR model with application to the 2009-2010 influenza A (H1N1) epidemic in France

Estimating epidemic parameters: Application to H1N1 pandemic data

Bayesian estimation of the dynamics of pandemic (H1N1) 2009 influenza transmission in Queensland: A space-time SIR-based model. Environmental Research

Modeling super-spreading events for infectious diseases: case study SARS

Deterministic SIR (Susceptible-Infected-Removed) models applied to varicella outbreaks

An introduction to compartmental modeling for the budding infectious disease modeler

Risk Analysis Foundations, Models, and Methods

A contribution to the mathematical theory of epidemics

Statistics based predictions of coronavirus 2019-nCoV spreading in mainland China. medRxiv

A time delay dynamical model for outbreak of 2019-nCoV and the parameter identification

Epidemic analysis of COVID-19 in China by dynamical modeling

Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV

Effective containment explains sub-exponential growth in confirmed cases of recent COVID-19 outbreak in mainland China

Lessons from the history of quarantine, from plague to influenza A. Emerging Infectious Diseases

A Time-dependent SIR model for COVID-19 with undetectable infected persons

SIR model with time dependent infectivity parameter: approximating the epidemic attractor and the importance of the initial phase

An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China. medRxiv

Modeling Count Data

Methods for estimating disease transmission rates: Evaluating the precision of Poisson regression and two novel methods

Fitting outbreak models to data from many small norovirus outbreaks

Multi-species SIR models from a dynamical Bayesian perspective

The estimation of the basic reproduction number for infectious diseases

Mathematical Epidemiology of Infectious Diseases: Model Building

Transmission potential of smallpox: estimates based on detailed data from an outbreak

Measurability of the epidemic reproduction number in data-driven contact networks

A Time-dependent SIR model for COVID-19 with undetectable infected persons

Notes on R0

A Practical Guide to Splines

Parameter estimation for differential equations: a generalized smoothing approach

Modelling transcriptional regulation using Gaussian processes

Linear latent force models using Gaussian processes

Latent force models

Mechanistic hierarchical Gaussian processes

Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation

New reproducing kernel functions. Mathematical Problems in Engineering

A review of spline function procedures in R. BMC medical research methodology

A note on the jackknife, the bootstrap and the delta method estimators of bias and variance

COVID-19, chronicle of an expected pandemic

Covid-19: how doctors and healthcare systems are tackling coronavirus worldwide

Closing borders is ridiculous': the epidemiologist behind Sweden's controversial coronavirus strategy

Why a South Korean church was the perfect petri dish for coronavirus

Coronavirus cases have dropped sharply in South Korea

R: A Language and Environment for Statistical Computing

Current status of global research on novel coronavirus disease (COVID-19): A bibliometric analysis and knowledge mapping. Available at SSRN 3547824

Investigating the cases of novel coronavirus disease (COVID-19) in China using dynamic statistical techniques

The impact of social distancing and epicenter lockdown on the COVID-19 epidemic in mainland China: A data-driven SEIQR model study. medRxiv

COVID-19 Italian and Europe epidemic evolution: A SEIR model with lockdown-dependent transmission rate based on Chinese data

The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak

As China's virus cases reach zero, experts warn of second wave

Asian nations face second wave of imported cases