key: cord-0956863-q37dl4gb
authors: Carroll, Rachel; Prentice, Christopher R.
title: Using spatial and temporal modeling to visualize the effects of U.S. state issued stay at home orders on COVID-19
date: 2021-07-06
journal: Sci Rep
DOI: 10.1038/s41598-021-93433-z
sha: 5d15d9c3b34f0bcd5d24fc56d30f7168e978a069
doc_id: 956863
cord_uid: q37dl4gb

Coronavirus disease 2019 dominated and augmented many aspects of life beginning in early 2020. Related research and data generation developed alongside its spread. We developed a Bayesian spatio-temporal Poisson disease mapping model for estimating real-time characteristics of the coronavirus disease in the United States. We also created several dashboards for visualization of the statistical model for fellow researchers and simpler spatial and temporal representations of the disease for consumption by analysts and data scientists in the policymaking community in our region. Findings suggest that the risk of confirmed cases is higher for health regions under partial stay at home orders and lower in health regions under full stay at home orders, when compared to before stay at home orders were declared. These results confirm the benefit of state-issued stay at home orders as well as suggest compliance to the directives towards the older population for adhering to social distancing guidelines.

A combination of the stay at home orders variable and standardized incidence ratios by health region for the confirmed cases outcome. (A) Before stay at home orders, (B) During full stay at home orders, (C) During partial stay at home orders, and (D) Never stay at home orders. Health regions are only present in panes where the stay at home orders category applied. www.nature.com/scientificreports/ risk increases in regions with higher percentages of Black or African American residents, and this finding is particularly noticeable in the South region. Some evidence exists that risk is also higher where there is a higher percent of smokers and a higher percent older population (light gray highlighted estimates). The testing variable was least consistent among the predictors in its effect on confirmed cases and deaths outcomes. It appears that more positive tests, both count of positive tests and percent positive tests, suggest more confirmed cases but not necessarily more deaths. Finally, all models include the same parameters for comparison, and they are well estimated at the 0.05 or 0.10 level for at least one model indicating an important relationship between that variable and the COVID-19 outcomes.

The results presented here demonstrate the usefulness of these spatio-temporal statistical models for identifying several important risk factors for a range of COVID-19 outcomes. The use of Bayesian inference is both relevant to this inquiry and valuable as a benchmark and template for researchers to use for additional applications in data science. Additionally, the dashboards offer an ideal interactive environment for displaying these results for a range of interested stakeholders.

The two state-level spatio-temporal variables were among the most important fixed effects in the model, despite their different roles. The stay at home orders variable provided important predictive information to describe differences in risk for health regions under the four levels of the variable, while the testing variable served as a control variable for the number and positive rate of tests performed. For the stay at home orders predictor, all levels were well estimated for nearly all models, and there was general consistency across the models with respect to the direction of association for all models except, notably, the West model. Interestingly, these estimates show that there is generally less risk for health regions under full stay at home orders and health regions that were never under stay at home orders, and more risk for health regions under partial stay at home orders when compared to Table 1 . Fixed effect parameter estimates for the confirmed cases and deaths outcomes represented as inverse logarithm mean estimates and associated 95% credible intervals. A Percent of positive tests was used for the Midwest and Northeast while log count of positive tests was used for all other models. B There is no estimate associated with the never category of the stay at home orders variable for the NE model because all states in the Northeast issued a stay at home orders. *Indicates well estimated Bayesian inference at the 0.05 level. ~ Indicates borderline Bayesian inference. Gray highlights indicate common estimate patterns mentioned in the text. Figure 2 displays the spatial and temporal random effect estimates for confirmed cases across the United States. These visuals display the residual spatial and temporal variation in the outcome after adjusting for the fixed effects in this table. Hence, increased (estimates greater than 1) or decreased (estimates less than 1) risk could arise from reasons pertaining to social distancing adherence or alternate population characteristics not considered in this study. In the spatial random effect, state borders are largely well defined here indicating consistency in the residual within states. After adjustment, many of the health regions with increased risk include parts of Idaho, Louisiana, Montana, and Alaska, Texas, Oklahoma, Arkansas, West Virginia, Virginia, Maryland, Delaware, and Hawaii. Alternatively, areas of decreased risk include parts of Louisiana and Mississippi, California, Florida, and New York. Finally, the temporal random effect parameter largely accounts for delays and errors in the data 11 www.nature.com/scientificreports/ health regions before stay at home orders were implemented. It is possible that governors in states with full stay at home orders are appropriately maintaining these orders where health regions are at greater risk to maintain control of disease spread within their borders. For states with partial stay at home orders the findings suggest they may have prematurely removed restrictions or, perhaps it was a calculated risk and governors surmised that the hospital systems were prepared to accommodate the potential influx of cases. Less risk in states with partial or full stay at home orders could indicate that stay at home orders were successful and removal could be considered. Similarly, less risk for health regions that were never under stay at home orders could suggest that these governors made the appropriate decision for their states. When considering results pertaining to testing, it is important to remember that these estimates do not suggest increased risk with more testing as a cause and effect relationship-merely that more comprehensive testing will necessarily increase the numbers of confirmed cases. Indeed, greater testing gives a better representation of the true disease presence in the country. Our examination time window and spatial aggregation may not be appropriate to examine the relationships between testing and the death outcomes. Examinations of these data in a recently published study suggest that there is a lag time of up to eight weeks between testing and death, potentially depending on the age of the individual 12 .

Other notable results uncovered here include decreased risk for health regions with more unemployment, increased risk for health regions as percent of Black or African American population increases in the South, and lower risk of new cases but higher risk of death for health regions with more older population. It is possible that with more unemployment, the frequency and volume of social interactions are diminished resulting in lower levels of disease transmission. Recall that these are unemployment estimates for March 2020, so health regions with more unemployment may also be health regions where more individuals were laid off or furloughed during the crisis. The finding that there is greater risk for health regions with more Black or African American population in the South aligns with anecdotal reporting and other findings regarding the disproportionate effects of this crisis on minority populations 13 . Incidentally, the South is an ideal place for the examination of minority populations because the presence is greater there than in other parts of the county. Finally, the age-related results-for regions with populations that have more residents aged 65 and older-suggest that older individuals adhere better to the social distancing standards but are still more susceptible to death, findings that reaffirm results in other studies 14, 15 .

Although the fixed effect estimates were quite similar across models, the spatial and temporal random effects were different when the US Census-defined regions were compared to those generated for the entire United www.nature.com/scientificreports/ States. Some of these differences are a consequence of scaling the random effects in the different models, but several indicators suggest an opposite risk direction. This finding indicates that there are differences in the COVID-19 crisis across the country; thus, it is critical to consider these areas on their own. The US Censusdefined region with the most consistent spatial random effect distribution to the one estimated for the United States as a whole was the Northeast. It is possible that the Northeast was driving much of the US-based model because the outbreak was centralized in that region for most of this time period. The West US Census-defined region was the most inconsistent with respect to the fixed effect estimates. We are not able to ascertain from these findings the exact cause of the disjuncture, but present two possible factors that could be explored in future academic inquiry: (1) the geographic and cultural heterogeneity within the West region and (2) the nature and duration of the stay at home orders in both California and Washington that appear different from other regions. The statistical model implemented in this study has both advantages and disadvantages for understanding disease spread. First, this model appears robust to data delays that are inherent in real-time data 11 . For example, there is a strong seasonal trend in the data where more new cases are reported later in the week versus earlier in the week. However, that same trend is not evident in the temporal random effect estimates, and we observed no well estimated parameters when including a variable that attempted to adjust for this trend. The reason for the models' robustness lies in the expected count ( e ij ) term, since that matrix by definition includes the same seasonal trend. One negative feature for these models involves the computational efficiency. While implementation via the INLA package in R speeds up the model fitting process, these complex models become more computationally inefficient with every day of new data generation. As such, real-time, or daily as used here, updates become less feasible the longer the crisis continues. Our current real-time rendering of these models uses weeks as the temporal unit for this reason. Finally, this is not a causal model, and we cannot determine the causes of higher or lower risk of COVID-19. Rather, we observe common characteristics among areas with higher or lower risk of COVID-19. While we include socio-demographic covariates, we cannot make claims of social determinants of health from our parameter estimates.

In sum, this article demonstrates how statistical models of this nature and caliber can illuminate important relationships with disease spread and identify relevant population-level characteristics. Most importantly, we believe that these results confirm the benefit of state-issued stay at home orders as well as suggest compliance with respect to the directives towards the older population for adhering to social distancing guidelines.

Data. Although most of the spatial data employed in this analysis are available at the county level, spatio-temporal statistical modeling of these data at this aggregation is complicated for two major reasons: (1) zero inflation and (2) large dimension. Both of these complications would render our methods inappropriate and inefficient. In order to avoid the issues associated with county-level modeling, we used a broader spatial aggregation-health regions, which are one or more counties serviced by the same health department 16 . By adopting health regions, the zero count of confirmed cases (as of May 18, 2020) dropped from 33,206 to 302 and the spatial dimension shrank from 3,142 (number of US counties) to 389 (number of US health regions). Correspondingly, we aggregated all county-level predictors and outcomes to the health region level. There is no evidence of loss of power with the aggregation of these data to the health region level 16 .

Science and Engineering 1 . Data for this study were geographically constrained to the United States and confined to the first eight weeks of recorded data (March 23, 2020 to May 18, 2020). Limiting the window of time to the first eight weeks was necessary and appropriate for three reasons. First, doing so guaranteed good representation in the two temporal varying predictors (state-issued stay at home orders and testing prevalence), particularly for the timeline pertaining to stay at home orders since several states had partially reopened but none had fully reopened. Second, by constraining the window of time it afforded the ability to examine data on a daily temporal scale as opposed to the real time models now that show data in weeks for computational efficiency. Finally, confining the data in this way allowed for the inclusion of testing information, which lags behind case counts in data release.

We examined four outcomes: number of confirmed cases, number of active cases, number of new cases, and number of deaths. Recovery data is not captured at the US county level and therefore is not incorporated as a stand-alone outcome or in the calculation of active cases. Additionally, there could be differences in covariate relationships based on the region of the country considered. As such, we made an estimation for the US as a whole and apportioned the numbers into the following regions as defined by the US Census Bureau 17 : South (S), West (W), Midwest (MW), and Northeast (NE).

Covariates for adjustment. All models adjust for the same neighborhood-level covariates, which were selected based on two criteria: to describe the state-level responses to the pandemic and to represent population level characteristics that may contribute to the disparities in outcome counts between health regions. We elaborate on the selection of predictors in the paragraphs below. The statistical model indirectly adjusts for differences in population across the observed areas (see the Statistical model section below for more details). Images (gif or png) for all predictors considered are available for review in the supplemental files.

To understand and adjust for the state-level response to COVID-19, we considered two important temporal predictors, the state-by-state issuance of stay at home orders and the rate of testing. These variables are specified at the state-level because they are either mandated at the state-level or are only available at that level of aggregation, www.nature.com/scientificreports/ respectively. The self-constructed predictor representing presence of statewide stay at home orders required daily updates when running in real-time. The most current, convenient, and reliable information for this categorical variable (levels: before, during-partial, during-full, and never) came from the New York Times 7 . To ensure accuracy, we concurrently examined government mandates for more information. To construct the testing variable, we produced either a log transformation of positive test count or a percent of positive tests variable. The variable varied by US Census-defined region (US, S, W, MW, and NE) depending on what appeared best fitting according to goodness of fit criterion and most appropriate in terms of estimate directionality as a control variable 18 . The region-specific testing variable was consistent across outcomes within that region. Population characteristics were also included as predictors in this model, specifically the presence of an international airport, land area of the health region, percent of the population that is unemployed, percent of the population that is a smoker, percent of the population that is aged 65 and older, and percent of the population that is Black or African American. The binary variable for presence of an international airport was used to represent community connectivity to international locations, which could be an important driver of disease progression particularly in this time window at the start of the pandemic. This predictor was created by combining information on airport locations from the U.S. Bureau of Transportation Statistics 4,5 and international flights for each of those airports from the US Department of Transportation.

Whereas urban areas typically comprise small health regions, rural areas are found in larger health regions. Size of a health region necessarily influences important factors like access to care, access to quality care, and capacity of health providers. Therefore, we adjusted for urbanicity by including information on the land area of each observation. Unemployment was estimated as the percent of the population that was unemployed in March 2020 in each health region using information from the Bureau of Labor Statistics 6 . The rest of the population demographic variables-smoking, age, and race-are 2018 estimates from the US Census Bureau and were included as proxies for comorbidities and vulnerable populations 3,19 . Statistical model. The statistical model applied here is commonly used in disease mapping for aggregated count outcomes, and is a variation of a Bayesian spatio-temporal Poisson Knorr-Held model [8] [9] [10] [20] [21] [22] . This model can be defined as follows for health region i and day j:

where y ij is the outcome of interest (count of confirmed cases, active cases, new cases, or deaths), µ ij is the mean of the Poisson model, e ij is the expected count, θ ij is the relative risk, X ′ ij is the design matrix for the predictors, β is the vector of fixed effect estimates, u i is the spatial random effect, and γ j is the temporal random effect. e ij is calculated as the rate of infection across all health regions on a given day times the population at risk for a given health region, which is assumed constant over time. As such, a unique e ij is produced for each health region and day. Given the Bayesian methodology applied here, prior distributions were required for all parameters and they were defined as follows: β k ∼ N 0, τ −1 β for each of the k fixed effect parameter estimates, u i ∼ N 0, τ −1 u for an uncorrelated spatial random effect, γ j ∼ N γ j−1 , τ −1 γ for a temporal random walk effect, and all precisions were such that τ ∼ Gam(2, 1) . All these prior distributions are considered non-or weakly-informative and sensitivity analysis (data not shown) suggests little influence on the resulting parameter estimates. This model description applies to all outcomes (confirmed cases, active cases, new cases, and deaths) and US Census-defined regions (US, S, W, MW, and NE) considered.

Computational details. R statistical software furnished much of the means for data processing and analysis in the work presented here. Specifically, the R packages rgdal, INLA, and fillmap were necessary for spatial data processing, statistical modeling, and spatial plotting respectively [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] . We, along with several others from the UNCW Data Science program, produced data visualization dashboards with an array of popular software including R Shiny, Tableau, and Power BI 2 . In addition to the regression model results, these dashboards also display general spatial, temporal, and spatio-temporal tracking of COVID-19 cases and deaths at the county, state, national, and international levels. Code for the statistical models and Shiny apps are available at this GitHub repository. 

An interactive web-based dashboard to track COVID-19 in real time

Web Application Framework for R

See Which States Are Reopening and Which Are Still Shut Down. The New York Times (2020). See Which States Are Reopening and Which Are Still Shut Down

Handbook of Spatial Epidemiology

Bayesian disease mapping for public health

On identifying and mitigating bias in the estimation of the COVID-19 case fatality rate

Visualizing the lagged connection between COVID-19 cases and deaths in the United States: An animation using per capita state-level data

The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States

Severe outcomes among patients with coronavirus disease 2019 (COVID-19)-United States

Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy

A comparison of spatial resolution in North Carolina. High Sch

Census regions and divisions of the United States

County-Level Smoking Data

Bayesian modeling of inseperable space-time variation in disease risk

Modelling risk from a disease in time and space

Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology

Bindings for the 'Geospatial' Data

Spatial and spatio-temporal models with R-INLA

On fitting spatio-temporal disease mapping models using approximate Bayesian inference

Bayesian computing with INLA: New features

A primer on disease mapping and ecological regression using INLA

Spatio-temporal disease mapping using INLA

Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion)

Create maps with SpatialPolygons objects

R: A language and environment for statistical computing. (R Foundation for Statistical Computing

Comparing INLA and OpenBUGS for hierarchical Poisson modeling in disease mapping

This research was supported in part by the University of North Carolina Wilmington's Center for Social Impact with special thanks to Cape Fear Collective.

R.C. performed the statistical analysis. All authors wrote and reviewed the manuscript.

The authors declare no competing interests.

Supplementary Information The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-021-93433-z.Correspondence and requests for materials should be addressed to R.C.

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.