key: cord-0568953-xybpc9lj
authors: Yang, Hou-Cheng; Xue, Yishu; Pan, Yuqing; Liu, Qingyang; Hu, Guanyu
title: Time Fused Coefficient SIR Model with Application to COVID-19 Epidemic in the United States
date: 2020-08-10
journal: nan
DOI: nan
sha: 0488e8c9a67b090c62cf6ba1fa55dc40ece93eb5
doc_id: 568953
cord_uid: xybpc9lj

In this paper, we propose a Susceptible-Infected-Removal (SIR) model with time fused coefficients. In particular, our proposed model discovers the underlying time homogeneity pattern for the SIR model's transmission rate and removal rate via Bayesian shrinkage priors. The properties of the proposed models are examined and a Markov chain Monte Carlo sampling algorithm is used to sample from the posterior distribution. Computation is facilitated by the nimble package in R, which provides a fast computation of our proposed method. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods. We further apply the proposed methodology to analyze different levels of COVID-19 data in the United States.

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first identified in December 2019, and then rapidly spread across the world, causing the current global pandemic of coronavirus disease 2019 . As of July 23, the novel coronavirus has spread to 216 countries and territories, with a total of more than 14 million confirmed infections and 600,000 fatal cases worldwide (World Health Organization, 2020) . Eight months after the initial outbreak, large numbers of new cases are still reported from many major countries, resulting in not only public health crises, but also severe economic and political ramifications. As the pandemic rages on with no end in sight, it is of urgent necessity for epidemiologists to quantify and interpret the trajectories of the COVID-19 pandemic, so as to help formulate more effective public policies.

The Susceptible-Infectious-Recovered (SIR; Kermack and McKendrick, 1927) model and its variants, such as Susceptible-Infected-Removed-Susceptible (SIRS; McKendrick, 1932, 1933) and Susceptible-Exposed-Infected-Removal (SEIR; Hethcote, 2000) models are commonly used to describe the dynamics of an infectious disease in a certain region.

In the basic SIR model, a population is segregated into three time-dependent compartments including Susceptible (S(t)), Infectious (I(t)), and Recovered/removed (R(t)). One who does not have the disease at time t, but may be infected due to contact with an infected person belongs to the susceptible compartment. The infected compartment is made up of those who have a disease at time t, and can potentially get a susceptible individual infected by contact. The recovered compartment include those who are either recovered or dead from the disease, and are no longer contagious, i.e., removed from the infectious compartment, at time. The removal can be due to several possible reasons, including death, recovery with immunity against reinfection, and quarantine and isolation from the rest of the population. A recovered/removed individual will not be back into the susceptible compartment anymore. Such model assumption match well with the COVID-19 outbreak, and therefore we adopt the SIR model as our basic model in this paper.

From the statistical perspective, the key study objective is the inference of transmission and recovery rates from the model. Regarding time-invariant SIR and SEIR models, there have been timely applications to early epidemic data right after the coronavirus outbreak (Read et al., 2020; Tang et al., 2020; Wu et al., 2020) . In order to differentiate the evolutional patterns of the COVID-19 epidemic among different regions, Hu and Geng (2020) developed a Bayesian heterogeneity learning methodology for SIRS. As the epidemic continued to spread rampantly, statisticians proposed time-dependent models based on SIR to elucidate the temporal dynamics of this disease Jo et al., 2020; Sun et al., 2020) . Estimated by various assumptions on temporal smoothness, the transmission and recovery rates of these models constantly alter over time, which makes them inefficient in detecting abrupt changes. In contrast, we consider the scenario that the transmission and recovery rates are constant within locally stationary periods segmented by a collection of change points, which is aligned with the fact that different stages of epidemic progression are naturally partitioned. This motivates us to estimate a piecewise constant model.

The fused lasso Tibshirani et al. (2005) , L 1 sparsity inducing penalty imposed on all successive differences, is one of the most popular methods for time fusion and change point detection. Motivated by the frequentist L 1 fusion penalty, Kyung et al. (2010) proposed its Bayesian counterpart, namely Bayesian fused lasso, which imposed Bayesian lasso (Park and Casella, 2008) (independent Laplace priors) on the differences. To solve the posterior inconsistency problem of the Bayesian fused lasso, Song and Cheng (2019) used heavier tailed student-t priors for Bayesian fusion estimation. In addition to Bayesian lasso and student-t, other Bayesian shrinkage priors with different statistical properties, such as spike-and-slab (George and McCulloch, 1993) and horseshoe priors (Carvalho et al., 2010) , can also be adopted to induce time fusion.

The contributions of this paper are in three-fold. First, we apply three different types of shrinkage priors to capture the time homogeneity patterns of infectious and removal rates under the SIR framework. Second, it is noticed that our proposed method can be easily implemented by the nimble package in R. A straightforward tutorial on using nimble to obtain shrinkage priors under the SIR framework is provided. Finally, several interesting findings are discovered through analysis on different levels including national level, state level, and county level of COVID-19 data.

The remainder of this paper is organized as follows. In Section 2, the COVID-19 data of selected state and county are introduced. We provide the literature review of SIR model and present our model framework in Section 3. Details of the Bayesian inference are presented in Section 4, including the MCMC algorithm and implementation code. Simulation studies are conducted in Section 5. Applications of the proposed methods to COVID-19 data are reported in Section 6. Section 7 concludes the paper with a discussion.

The COVID-19 data is obtained from the R package COVID19 (Guidotti and Ardia, 2020) . We consider the observations recorded from 2020-05-14 to 2020-07-23, a 71-day long period. US nationwide aggregated data, as well as data for five states: New York (NY), California (CA), Florida (FL), South Dakota (SD), and Wyoming (WY) are our focus in this study. We also consider the county-level data including: Los Angeles, Miami-Dade and New York City. The data is reported daily, with variables including the population size, the number of confirmed cases, the number of recoveries, and the number of deaths, etc.

Note that the removal group for county-level data only contain deaths and there is no information available for recoveries. Similar to in Sun et al. (2020) , a three-point moving average filter is applied to the infectious group I(t) and removal group R(t) to remove noise. Due to the large size of the susceptible group, the group sizes are visualized on a natural log scale in Figure 1 and Figure 2 . The infectious and removal numbers are much smaller in SD and WY when compared to other states. As during the studied period, NY is still under lock-down, both the infectious and removal groups experienced slow increase. For CA and FL, however, potentially due to re-open in early May, their infectious and removal groups saw rapid increase. The three counties selected are the metropolitan areas in CA, FL and NY and the trends observed are similar to those in their respective states.

In the SIR model, we consider a fixed total population of size N . By "fixed", we assume that the population size does not vary over time. The effect of natural death or birth are not considered here, as the outstanding period of an infectious disease is much shorter than human average lifetime. Denote, at time t (t ≥ 1), the counts of susceptible, infectious, and recovered/removed persons within a given region as S(t), I(t) and R(t), respectively, and the relationship N = S(t) + I(t) + R(t) always holds. The focus of this study under SIR framework is how the size of each group change over time.

Two parameters in the SIR model are time-invariant: the transmission rate β, and the recovering rate γ. The transmission rate β controls how much the disease can be transmitted through exposure. It is jointly determined by the chance of contact and the probability of disease transmission. The recovering rate γ stands for the rate at which infected individuals recover or die. Time-varying property of these two parameters is ignored in traditional SIR modeling, which is a rather strong simplifying assumption that hurdles the model's prediction power for disease trend. Therefore, we adopt the time-varying SIR (vSIR; Sun et al., 2020) framework, where both β and γ are functions of time t. The vSIR model can be viewed as both a deterministic model and a stochastic model. The deterministic vSIR model allows us to describe the number of people in each compartment with the ordinary differential equations (ODEs). A generalized version of the deterministic vSIR model with the infectious rate β(t) and the removal rate γ(t) respect to time is as follows:

While the deterministic vSIR model seems appealing due to its simplicity, the spread of a disease, however, is naturally stochastic. Disease transmission between two individuals is random rather than deterministic. The stochastic formulation of the vSIR model is, therefore, preferred for epidemic modeling purposes, as it allows for randomness in the disease spreading process. We will discuss our model framework in Section 3.3.

For most infectious diseases, smooth changes of infectious and removal rate are not always true, since some government policies could have dramatic effects on parameters in the SIR model. In other words, the infectious rate β(t) and removal rate γ(t) can take a certain fixed value within a certain time period, and change to another value in a period that follows.

Identifying the subpopulation structure of these two parameters with time fusion patterns in the SIR model will enhance our understanding of the infectious disease such as COVID-19. In this paper, we assume that successive differences of the infectious rate ∆β(t) = β(t)−β(t−1) and removal rate ∆γ(t) = γ(t)−γ(t−1) both have an unknown clustered pattern with respect to time. Both ∆β(1) and ∆γ(1) are defaulted to 0. For example, with a cluster of 0's in the successive differences, β(t) would remain constant over the corresponding time period. Towards this end, we use three different shrinkage priors on both ∆β(t) and ∆γ(t) to detect such clusters, including the student-t prior, horseshoe prior and spike-and-slab prior (see, Piironen et al., 2017; Song and Cheng, 2019 , for more discussion). The first prior we consider is student-t prior. Such a fusion prior will shrink successive differences towards zero and hence induce posterior blocking. Compared to Laplace fusion prior, student-t fusion prior induces stronger shrinkage effect and enjoys a nice posterior consistency property (Song and Cheng, 2019) . As Laplace prior has posterior inconsistency due to its light tail, Song and Liang (2017) suggested to use heavy tail distributions, e.g., student-t prior. Also the Laplace prior usually leads to a smoothly varying estimation, which means it cannot identify the clustered structure. The student-t shrinkage prior on the successive differences is

where T denotes the termination time of observation, t ω 1 (ω 2 ) denotes the student-t distribution with degree of freedom ω 1 and scale parameter ω 2 . Note that the above student-t distri-bution can be rewritten as an inverse-gamma scaled Gaussian mixture, and hence the model can be re-written as:

where a t , b t satisfy df β = 2a t and l β = bt at . The second prior is the horseshoe prior (Carvalho et al., 2009 (Carvalho et al., , 2010 . It is a continuous shrinkage prior and so called global-local shrinkage prior as

where c is a global hyperparameter which shrinks all parameters towards zero. While the heavy-tailed half-Cauchy prior for the local hyperparameter λ t allows for a subset elements of ∆β(t) to escape from the shrinkage. Also, the horseshoe prior has exhibited ideal theoretical characteristics and good empirical performance (Datta et al., 2013; Van Der Pas et al., 2014) .

The third prior is the spike-and-slab prior (Mitchell and Beauchamp, 1988; George and McCulloch, 1993) . It is a two component discrete mixture prior. In this paper, we write it as a two-component mixture of Gaussian distributions,

c and λ is an indicator that takes values in {0, 1}. In some cases, is set to 0 so that the spike is taken to a point mass at the origin δ 0 . This distribution can be sensitive to prior choices of the slab width or prior inclusion probability. In this paper, we fix the probability inclusion p. For c, we assume an inverse gamma distribution for it to impart a quite heavy tail and keep probability further from zero than the Gamma distribution.

In this section, we propose our model framework for the COVID-19 data. The model is names as hierarchical time fusion SIR model (tf-SIR). We first consider the vSIR-Poisson process framework of Sun et al. (2020) with two time-varying parameters, β(t) and γ(t). Let N = S(t)+I(t)+R(t) be the total population, M (t) = I(t)+R(t) denote the cumulative number of diagnosed cases and ∆M (t) = M (t)−M (t−1), ∆R(t) = R(t)−R(t−1) represent the daily changes of M (t) and R(t). The initial values ∆M (1) and ∆R(1) are defaulted, respectively, to M (1) and R(1). Denote the time domain as t = 1 . . . T . Hence we have

Next, we apply three different priors on successive differences of both parameters ∆β(t) = β(t) − β(t − 1) and ∆γ(t) = γ(t) − γ(t − 1) for a Bayesian fusion problem. The first prior is student-t distribution prior as

where t ω 1 (ω 2 ) denotes the student-t distribution with degree of freedom ω 1 and scale parameter ω 2 . Note that the above student-t distribution can be rewritten as an inverse-gamma scaled Gaussian mixture, and the model can be re-written as:

where a t , b t satisfy conditions df β = 2a t and l β = bt at . Similar, c t , d t satisfy conditions df γ = 2c t and l γ = dt ct With the horseshoe prior, the model can be formulated as:

where σ 2 β and σ 2 γ are global parameters shrinking all ∆β(t) and ∆γ(t) towards zero, and both are fixed numbers. Both λ t and η t are local parameters that allow for some ∆β(t) and ∆γ(t) to escape the shrinkage. By changing the value of σ 2 β and σ 2 γ , different levels of sparsity can be achieved. With larger values, we have little shrinkage, while with smaller values, all weights tend to be shrunk to near 0.

With the third prior, spike and slab, the model is formulated as

where both p and π are fixed, 2 is a small value, and λ t , η t ∈ {0, 1} denote whether the successive difference for both parameters are close to zero. We have a spike when λ t = 0 and η t = 0, and a slab when λ t = 1 and η t = 1.

In this section, we present the main function implemented by R package nimble (de Valpine et al., 2017), in particular, function nimbleCode. The simulated data contains four variables.

A time variable ranging from 0 to 80 with increment of size 1 is created. The other three variables are the number of observations within each of the S, I, and R groups, respectively. Parameter deltaM denotes the quantity ∆M (t) and deltaR denotes the quantity ∆R(t) in Equation (4). Both deltaM and deltaR are vectors of length 80, corresponding to ∆M (t) and ∆R(t) for t = 1, . . . , 80. Variables beta and gamma are also vectors of length 80, which represent β(t) and γ(t) for t = 1, . . . , 80. Variables sigma2beta and sigma2gamma correspond to components in the normal distributions for β(t) − β(t − 1) and γ(t) − γ(t − 1), and they are both assigned non-informative priors of IG(0.5,0.5). The next code chunk iterates through the time after day 1. Variables deltaM and deltaR still follow the specified Poisson distribution, and the incremental in β(t), ∆β(t), denoted in the code as theta beta[t-1] for convenience of indexing, follows N(0, λ t−1 σ 2 β ) with the prior for λ t−1 specified to be IG(2, 0.0001). Similar setting is applied on incremental in γ(t) as well, denoted by theta gamma. After defining model, constants and initial values are supplied to run MCMC. Here we set T <-80 for 80 days of development. Variable population is the total number of observations, which is set to 1000000 and remains constant. Variables deltaM and deltaR respectively denote the first order differences in the generated data for cumulative diagnosed cases and recovered cases, respectively. List SIRInits contains initial values used for simulation, and finally the MCMC is invoked by function nimbleMCMC(). With niter = 50, thin = 10, nchains = 1, output mcmc.out contains 5000 rows of results, which, after the burn-in component is discarded, can be used for Bayesian estimation and inference.

SIRdata <-list(population = population, deltaM = deltaM, deltaR = deltaR, S = S, I = I) SIRConsts <-list(T = T) SIRInits <-list(beta = rep(0.3, T), gamma = rep(0.01, T), eta = rep(1, T -1), lambda = rep(5, T -1), theta_beta = rep(0, T -1), theta_gamma = rep(0, T -1), sigma2beta = 1,sigma2gamma = 1) mcmc.out <-nimbleMCMC(code = SIRCode, data = SIRdata, constants = SIRConsts, inits = SIRInits, monitors = c("beta", "gamma", "lambda", 'eta'), niter = 50000, thin = 10, nchains = 1)

For the horseshoe prior, we follow the same way as above to initial parameters. The iteration part to update day-to-day incrementals is different. Here variable theta beta [t-1] follows distribution N(0, c 2 λ 2 t ), where λ t is constructed by the absolute value of ratio between two starndard normally distributed variables a lam [t-1] In the end, the core code of spike-and-slab prior is given as below. The spike and slab for ∆β(t) are, respectively, denoted by theta beta2[t-1] and theta beta1[t-1]. Variable lambda[t-1] represents parameter λ t−1 in Equation (6), and is assumed to follow distribution Bern(0.5). Similar constructions are applied on the spike and slab for variables related to δγ(t), including theta gamma2[t-1], theta gamma1[t-1], and eta[t-1].

(1 -lambda[t -1]) * theta_beta2[t -1] theta_beta1[t -1]~dnorm(0, sd = sqrt(sigma2beta)) theta_beta2[t -1]~dnorm(0, sd = sqrt(sigma2beta / 100)) lambda[t -1]~dbern(0.5) gamma[t] <-gamma[t -1] + eta[t -1] * theta_gamma1[t -1] + (1 -eta[t -1]) * theta_gamma2[t -1] theta_gamma1[t -1]~dnorm(0, sd = sqrt(sigma2gamma)) theta_gamma2[t -1]~dnorm(0, sd = sqrt(sigma2gamma / 100)) eta[t -1]~dbern(0.5) }

We use the R package SimInf (Widgren et al., 2019) to generate data. Four designs over a time span of 80 days are considered. The time domain is divided into four equally sized pieces each spanning for 20 days, where both the infectious rate β(t) and the removal rate γ(t) are piecewise constant within each of them. The four designs have different β(t), γ(t), and population size N , and the numerical values are listed in Table 1 . Four example datasets, one for each design, are visualized in Figure 3 . Design 1 corresponds to a fairly high infectious disease with high removal/recovery rate and design 2 corresponds to a mildly infectious disease with similar removal/recovery rate. Designs 3 and 4 have larger population sizes, and a disease with smaller numerical values for β(t) can infect a large portion of the population, such as demonstrated for Design 3. Design 4, with small β(t) and γ(t), exhibits slow overall development. A total of 100 replicates are performed for each design. 

The estimation performance for β(t) is measured using the following metrics:

where β (t) is the posterior estimate of β at time t in the th replicate for = 1, . . . , 100 and t = 1, . . . , 80, and β(t) = 1 100 100 =1 β (t). The parameter estimates for β(t) and γ(t) for the 100 replicates are visualized respectively in Figures 4 and 5 , and the three performance metrics are plotted in Figures 6 and 7 .

The true underlying values are also plotted in (blue) dashed lines in Figures 4 and 5 . The first observation is that, under all four designs, all three models yield quite accurate parameter estimation performance, as the grey band formed by 100 parameters lie close to or around the (blue) dashed line in both plots. Secondly, as the population size in Design 3 and Design 4 is 10 times that in Design 1 and 2, their corresponding grey bands are, overall, tighter. Thirdly, in both figures, the grey bands corresponding to the t-shrinkage prior is narrower than those for horseshoe and spike-and-slab, indicating overall relatively stable estimation performance. This is also reflected in the plots for performance measures in Figures 6 and 7 under the panels for SD. The three models are compared in terms of the performance metrics in Figures 6 and 7 . One interesting observation is that the MAB and MSE tend to be large near when t ∈ {20, 40, 60}, which correspond to when changes in parameters occur. They then stabilize as the disease continues to develop. For relatively larger values of the true parameter, the MAB and MSE are larger than for small values of true parameters. As can be observed from the third column in both Figures 6 and 7 , the horseshoe and spike-and-slab priors perform similarly in terms of MAB, MSE and SD. When the sample size is 10 6 , the t-shrinkage prior yield parameter estimates that are overall more stable and have smaller SD than the other two. This difference, however, decreases with increase in sample size.

The proposed methodology is applied on COVID-19 data for both state-level and county level introduced in Section 2. Analysis for other states and counties can be conducted in the same way, which is omitted in this paper. The number of MCMC iterations is set to be 5000 and the first 2000 ones are treated as burn-in. All other settings remain the same as simulation study. The results of estimated infectious rate and removal rate are shown in In both state-level and county-level, three different priors present similar result. We are interested in a fusion estimation for the trend. We find the infectious rate for New York is smaller than other states during this period. Florida witnesses a pump peak after mid June, which results from the aggresive reopen in Florida. As for the removal rate, New York, Florida and California have similar result. The nation-wide removal rate is constant across the time and has few peaks during this time period. The trend of infectious and removal rate for South Dakota and Wyoming are similar. For county-level, the infectious of New York City looks flat. Miami-Dade county has higher infectious rate after mid of June. The removal rates are very small in all the counties, since the removal group only contains deaths in county-level data.

In this paper, we proposed the tf-SIR model to capture group structure for infectious rate and removal rate of different time period by using Bayesian shrinkage priors. To our best knowledge, this is the first attempt in literature to use Bayesian shrinkage to recover unknown grouping structure for SIR model. Our simulation results indicate that the proposed method has reasonable performance and ability to capture the group pattern of infectious rate and removal rate. The analysis of COVID-19 data also brings in new understanding of the infectious disease such as COVID-19. Also, our tf-SIR model can not only be used to model Figure 11 : Plot for the estimated removal rate γ(t) for the three selected counties over the studied time frame. and assess COVID-19 pandemic but also other epidemic. In addition, three topics beyond the scope of this paper are worth further investigation. First, in our real data application, a moving average approach is applied to deal with measurement errors for observed data. Proposing a measurement error model with SIR is an interesting future work. Furthermore, different states may have similar infection and removal pattern. Subgroup detection for different states will help the government design their polices. Finally, discovering theoretical guarantees such as posterior concentration rates of proposed methods is also devoted to future research.

Handling sparsity via the horseshoe

The horseshoe estimator for sparse signals

A time-dependent SIR model for COVID-19

Asymptotic properties of Bayes risk for the horseshoe prior

Programming with models: writing statistical algorithms for general model structures with NIMBLE

Variable selection via Gibbs sampling

COVID-19 data hub

The mathematics of infectious diseases

Heterogeneity learning for SIRS model: an application to the COVID-19

Analysis of COVID-19 spread in South Korea using the SIR model with time-dependent parameters and deep learning

A contribution to the mathematical theory of epidemics

Contributions to the mathematical theory of epidemics. II.the problem of endemicity

Contributions to the mathematical theory of epidemics. III.further studies of the problem of endemicity

Penalized regression, standard errors, and bayesian lassos

Bayesian variable selection in linear regression

The bayesian lasso

Sparsity information and regularization in the horseshoe and other shrinkage priors

Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions

Bayesian fusion estimation via t shrinkage

Nearly optimal Bayesian shrinkage for high dimensional regression

Tracking and predicting COVID-19 epidemic in China mainland

Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions

Sparsity and smoothness via the fused lasso

The horseshoe estimator: Posterior concentration around nearly black vectors

SimInf: An R package for data-driven stochastic disease spread simulations

WHO coronavirus disease (COVID-19) dashboard

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study