key: cord-0931907-reshu6qq authors: Ji, J.; Wang, C.; Rotolo, M.; Zimmerman, J. title: Modeling Regional Disease Spread Over Time Using a Dynamic Spatio-temporal Model – With an Application to Porcine Epidemic Diarrhea Virus data in Iowa, U.S. date: 2020-06-20 journal: Prev Vet Med DOI: 10.1016/j.prevetmed.2020.105053 sha: d66010c5b1c86efa6b1e455e0eb41cc86aba9920 doc_id: 931907 cord_uid: reshu6qq Regional surveillance is important for detecting the incursion of new pathogens and informing disease monitoring and control programs. Modeling disease distribution over time can provide insight into the development of more efficient regional surveillance approaches. Herein we propose a Bayesian spatio-temporal model to describe the distribution of porcine epidemic diarrhea virus (PEDV) in Iowa USA. Model parameters are estimated through a Bayesian spatio-temporal model approach which can account for missing values. For illustration, we apply the proposed model to PEDV test results from the Iowa State University Veterinary Diagnostic Laboratory (ISU-VDL). A simulation study carried out to evaluate the model showed that the proposed model captured the pattern of PEDV distribution and its spatio-temporal dependence. The aim of this project was to explore area surveillance methods for livestock farms, with the animals. However, the imperative of rapid detection must be balanced by the cost of surveillance. 30 An extreme example is the case of bovine spongiform encephalopathy for which the cost of finding 31 one positive case in slaughter cattle in the European Union was reported as e2.3 million for the 32 year 2003 (Heim and Mumford, 2005) . Essentially, the challenge is efficient disease detection at an 33 affordable cost. (sample collection cost is avoided) and the opportunity for real-time detection. 48 Typically, contagious infectious diseases spread widely following their initial introduction, reach 49 a peak prevalence, and then establish a cycle in the population that mirrors changes in population Bayesian methods provide great advantage in the analysis of complex models with complicated data 57 structure. Given the flexibility and generality of the Bayesian framework, we are allowed to cope 58 with complex problems, especially when we have latent variables, missing data, or multilayered test results. In order to maintain data integrity, the results were processed to remove any result 77 that did not have an associated valid address representing a swine premises. Premises identification 78 numbers (PIN) and premises submission level identifiers were entered into Google Earth and if a 79 swine premises was located at the entered address the result was maintained. If the address or PIN 80 provided did not pertain to a swine premises, the result was removed from the data set. Binary 81 (positive/non-positive) results were used for summary and analyses. In our study, we selected the 82 6-month period, August 2016 to January 2017, with the most complete data to address the spread 83 of PEDV. In this paper, we propose a Bayesian spatio-temporal model for the distribution of PEDV at the 85 county level in Iowa that accounts for the effect of spatial distance among counties. As illustrated not submit samples for testing if the disease status is not a concern and the farm lost all of the 90 producers may have elected to submit samples for testing to a laboratory other than the ISU-VDL. Our goal is to build a model reflects the major trend of missing. Thus to account for the missing 93 values, we added constraints based on the two major reasons described earlier to the Bayesian 94 spatio-temporal model. The paper is organized as follows: Section 2 presents the proposed spatio-temporal model and where p s,t is the probability of samples in county s at time t being tested PEDV positive, and ξ s,t 108 is its logit transformation. Here {ξ s,t } are latent variables which have spatio-temporal dependence. That is, ξ s,t is influenced by its previous time value ξ s,t−1 and the status of neighbor counties of 110 s at time t − 1. To account for this spatio-temporal dependence among counties, we model the 111 increment of ξ s,t , ξ s,t = ξ s,t − ξ s,t−1 , to be a linear function of the probabilities of being tested as Here β 0 is the intercept which accounts for the average increment of ξ across time without consid-115 ering any spatial interactions. d s,s is the distance between county s and county s , and p s ,t−1 is 116 the probability of samples from county s at time t − 1 being tested PEDV positive. In this model, 117 the spatial impact of county s on county s is quantified as α s =s exp(−d s,s )p s ,t−1 , which de-118 creases exponentially as the distance d s,s increases. Thus for any county, its neighbor counties 119 have larger influence than other counties that are far away. The parameter α controls the overall 120 magnitude of the spatial impacts on ξ s,t from all other counties s = s at the previous time point. 121 The structure of the model is autoregressive and thus ξ has the Markov property. There are also 122 some other covariates can potentially affect ξ s,t , such as the number of farms in the county and 123 the transportation infrastructure (interstate, US and state highways) in the county. The latter was 124 considered because pigs are routinely moved between farms and counties in the production cycle. The presence of more farms and more developed transportation infrastructure may have affected 126 the distribution of the virus. Thus, we include covariates in the model: for t = 1, 2, ..., T and s = 1, 2, ..., n. Here X s,t = (1, X 1,s,t , X 2,s,t , ..., X p,s,t ) is the p + 1 dimensional vector of constant one and the 129 covariates that affect the distribution of PEDV. β = (β 0 , β 1 , ..., β p ) is the corresponding vector of 130 regression parameters. φ s,t 's are the random errors accounting for other unmeasured factors, which 131 are independently identically distributed as N (0, τ 2 ). The initial states, ξ s,0 , for s = 1, 2, ..., n, are 132 modeled as ξ s,0 high. Then one assumption we can make in the analysis is that the decision not to submit samples means that the disease status at that time point is not a concern, thus a low proportion (less than 140 0.1) of PEDV can be assumed. Therefore, if a county s at time t has y s,t missing, we put constraints 141 on ξ s,t as ξ s,t ∈ [logit(0.001), logit(0.1)] to make the proportion of PEDV to be less than 0.1. In addition, if a farm has submitted samples at time t − 1 but none at time t, i.e. y s,t−1 143 is observed but y s,t is missing, it is very likely that all the piglets in that farm died. PEDV 144 infection remains endemic, and it is impossible for the infection to disappear naturally within a 145 short period of time. In many cases during the acute outbreak, farms were unable to produce 146 healthy piglets for weeks or even a few months. These cases do not fit in our model, since the 147 model is only intended to describe the spatio-temporal trends in PEDV. Therefore, the spatio-148 temporal dependency will be cut for such cases, which results the corresponding ξ s,t to be no longer To incorporate these constraints into the model, we first define the following five sets: For t = 0 and all county s = 1, 2, ..., n, we have ξ s,0 , which is the same as the model 153 described in section 2.1. For t = 1, 2, ..., T, and county s = 1, 2, ..., n, we have the following model: For (s, t) ∈ Ω 1 , this model sets density of ξ s,t as a constant, which means its distribution is For t = 0, given ξ 1,0 's are independently and identically distributed, ξ 0 = (ξ 1,0 , ..., ξ n,0 ) has the 166 density as: For t = 1, ..., T , based on the Markov property, the joint density of ξ 1:T = (ξ 1 , ..., ξ T ) has the 168 following form Thus the joint pdf for ξ is derived as: Let y = (y 1 , y 2 , ..., y T ) be the observed outcomes for all counties at all time, where each y t = 171 (y 1,t , y 2,t , ..., y n,t ) is a vector of observed outcome for all counties at time point t. Finally, the joint density of y and ξ has the following form: The joint pdf of ξ is then derived as: 177 f (ξ|β, α, µ, τ 2 , σ 2 ) =f (ξ 0 |µ, σ 2 )f (ξ 1:T |ξ 0 , β, α, τ 2 ) (5) where |Ω 2 | is the number of elements in set Ω 2 and I() is an indicator function. Finally, the joint density of y and ξ has the following form: Bayesian posterior inference will be used to estimate the parameters of interest include β = 182 (β 0 , β 1 , ..., β p ) , α, µ, σ 2 and τ 2 . The prior distribution for each of the parameters are as follows: The parameterization of the Gamma distribution prior for τ −2 has the mean of a b and variance Table 1 . The simulation result for model with missing values constraints is shown in Table 2 . (Table 3) . Thus, our final model only includes the intercept, Agresti and Coull (1998) and used the following weighted average to approximate p s,t : where z α is the α quantile of the standard normal. Here we choose z α = z 0.975 and then z α ≈ 1.96. This makesp s,t fall into (0, 1) and shrinks the observed proportion of PEDV to 0.5. When N s,t 259 goes up, the weight assigned top s,t increases and the shrinkage becomes less. We implement this 260 correction for all p s,t 's, and then use logit(p s,t ) as the initial value of ξ s,t . The estimation of parameters were based on 15,000 MCMC iterations after disregarding the first ters estimation results showed strong evidence of distance-dependency at a regional (county) level. Given the importance of efficient regional surveillance, the proposed model can be used to predict J o u r n a l P r e -p r o o f Approximate is better than "exact" for interval estimation of 325 binomial proportions Exact goodness-of-fit tests for markov chains Epidemic and economic 328 impacts of delayed detection of foot-and-mouth disease: a case study of a simulated outbreak in 329 california Evaluation of spatio-331 temporal bayesian models for the spread of infectious diseases in oil palm Bayesian data 334 analysis An adaptive metropolis algorithm The future of bse from the global perspective Bayesian modelling of inseparable space-time variation in disease risk Bayesian disease mapping: hierarchical modeling in spatial epidemiology Updated estimated economic welfare impacts of porcine epidemic diarrhea 344 virus (PEDV) Bayesian spatio-temporal analysis of joint patterns 346 of male and female lung cancer risks in yorkshire (uk) For model under complete data setting 363 Given initial value α (0) , σ 2 (0) , τ 2 (0) and ξ (0) , for iteration m = 1, ...M : 364 1. Sample β (m) from a Multivariate Normal distribution with meanand covariance matrix τ −2 T t=1 X T t X t + s −2 β I −1. Here X t = (X 1,t , ..., X n,t ) is a n × for each ξ s,t is as follows. • For t = 0 (1 + exp(ξ (m−1) s,0 )) −Ns,0 exp y s,0 (ξ (1 + exp(ξ * s,t )) −Ns,t exp y s,t ξ *