key: cord-0231403-rg3341p2
authors: Zhu, Shixiang; Bukharin, Alexander; Xie, Liyan; Santillana, Mauricio; Yang, Shihao; Xie, Yao
title: High-resolution Spatio-temporal Model for County-level COVID-19 Activity in the U.S
date: 2020-09-15
journal: nan
DOI: nan
sha: c22dd759cc80ba616acb56a1619011e9df9d7a28
doc_id: 231403
cord_uid: rg3341p2

We present an interpretable high-resolution spatio-temporal model to estimate COVID-19 deaths together with confirmed cases one-week ahead of the current time, at the county-level and weekly aggregated, in the United States. A notable feature of our spatio-temporal model is that it considers the (a) temporal auto- and pairwise correlation of the two local time series (confirmed cases and death of the COVID-19), (b) dynamics between locations (propagation between counties), and (c) covariates such as local within-community mobility and social demographic factors. The within-community mobility and demographic factors, such as total population and the proportion of the elderly, are included as important predictors since they are hypothesized to be important in determining the dynamics of COVID-19. To reduce the model's high-dimensionality, we impose sparsity structures as constraints and emphasize the impact of the top ten metropolitan areas in the nation, which we refer (and treat within our models) as hubs in spreading the disease. Our retrospective out-of-sample county-level predictions were able to forecast the subsequently observed COVID-19 activity accurately. The proposed multi-variate predictive models were designed to be highly interpretable, with clear identification and quantification of the most important factors that determine the dynamics of COVID-19. Ongoing work involves incorporating more covariates, such as education and income, to improve prediction accuracy and model interpretability.

The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has affected nearly everyone's lives on the planet. Even the largest economies' resources have been strained due to the large infectivity and transmissibility of COVID-19. As the number of cases of COVID-19 continues increasing, understanding finer-grained spatio-temporal dynamics of this disease and some of the leading factors affecting disease transmissions is critical to helping officials make policy decisions and curb the further spread of the disease.

Most of the previous research aimed at studying the spread of COVID-19 has focused on two key measurements: the number of confirmed cases and the number of deaths. Cases going up or down over time shed light on the rate of spread of COVID-19 at a given point in time -but it is only valid if enough people get tested. The limited testing ability resulted in a severe underestimation of COVID-19 cases in the pandemic's early stages [35] . For example, when there was not enough testing capacity, as was the case in New York City in March 2020, the number of cases reported was an undercount of actual cases, estimated to be much larger (up by a factor of 10) [23, 40] . Some studies have circumvented underestimation by considering the case positivity rate, which measures the percentage of total COVID-19 tests conducted that are positive. However, most of the widely-used COVID-19 data sets, such as the COVID Tracking Project [48] , only collect the total number of people with a completed polymerize chain reaction (PCR) test that returns positive as reported by the state or territory, which has a much lower spatial resolution (state-level) in comparison with the cases and deaths data (county-level). Such coarse-grained testing numbers would introduce extra noise to our model and would most likely be incapable of improving the confirmed case prediction accuracy at the county level. Deaths are also an important metric that most people care about regarding the virus's ultimate epidemiological impact. In contrast to the number of confirmed cases, the number of deaths is a good and accurate indicator for evaluating how serious a burden this pandemic is causing, not only on health care systems but also on the general public's mental health and well-being. Some epidemiological studies, such as [33] , also recommend tracking deaths, even though deaths lag behind new cases, typically by two weeks to a month.

A large amount of fine-grained data offers a unique opportunity to study the disease's spread dynamics from a micro-level view. For the United States, several teams have been working on collecting comprehensive COVID-19 tracking data, including daily counts of cases and deaths at the county level. Such data gives us a general picture of how the virus is spreading across metropolitan and micropolitan counties and how such dynamics are evolving. Besides considering the cases and the deaths, we also aim to study other critical local factors in transmitting the COVID-19. Recent studies [38] on the spread of COVID-19 show that besides the distance to the epicenter, other factors, such as subway and airport, are positively connected with the virus transmission. Moreover, both urban areas and population density are positively associated with the spread of COVID-19 after the outbreak. The proportion of the elderly population has also been identified as a key factor in the death rate. Therefore, we consider the within-community mobility and two critical demographic factors by taking advantage of the COVID-19 Community Mobility Reports [21] and the American Community Survey (ACS) [8] . These two data sets are publicly available and include detailed county-level statistics that provide insights into what has changed in response to policies aimed at combating the COVID-19 and what factors may affect the disease's transmission. As illustrated in Fig. 1 , in our model, we assume the numbers of cases and deaths in each county depend on the neighboring counties and major metropolitan areas in the U.S., which we refer to as hubs in spreading the disease. Local community mobility and demographic factors, including population and elderly population, are considered local covariates in the model, which also play a crucial role in the final number of deaths.

In this paper, we use a data-driven method incorporating a large-scale data set from multiple sources to predict the deaths and the confirmed cases of COVID-19 at the county level in the United States. Since death is a more accurate indicator for assessing the spread of the virus, we emphasize predicting county-level deaths' Figure 1 : An example of spatio-temporal covariates in our model for Coconino County, Arizona. Based on the counties in the U.S. as fundamental units, we assume the number of confirmed cases and deaths of COVID-19 reported in a given county are jointly related to the numbers reported in its adjacent counties (they are Kane, San Juan, Navajo, Gila, Yavapai, Mohave for Coconino in this example) and ten selected nationwide hubs (including San Francisco, Los Angeles, Seattle, Chicago, Atlanta, Miami, Washington, D.C., Boston, and New York). The numbers of cases and deaths also depend on some local covariates, such as community mobility level and some counties' demographic factors. trajectories instead of the confirmed cases. Our method's most notable contribution is considering the spatial structure among hubs and neighboring counties in modeling the cross-correlation between cases and deaths. We also present the effect of a wide variety of geographic community mobility and social demographic factors on the spread of COVID-19. Our approach drastically differs from previous studies [14, 6, 1] , in which the number of cases and deaths, and other covariates, including the community mobility and social demographic factors, are interlinked through a vector autoregressive process. Our model shows that these hubs play a pivotal role in spreading the disease. We also find that both cases and deaths are significantly related to the local level's total population and that deaths are also positively associated with the proportion of the elderly population. Additionally, we found that confirmed cases are not significantly related to the proportion of the elderly population, which may prove that the disease was mostly circulating among young people in its later stage. In particular, while we identify a spike in cases since the beginning of the summer, we do not observe a clear spike in deaths. This may be explained by the fact that a more significant proportion of young people, who are generally at lower risk of death, were infected in the more recent pandemic stages.

The remainder of the article is organized as follows. We first review related works in the rest of this section, followed by describing the data sets we have used in Section 2. Section 3 presents our proposed vector autoregressive model with spatial structure incorporated. We demonstrate the effectiveness of our model and discuss its interpretation in Section 4. Lastly, the article concludes with discussions and future research directions in Section 5.

Related work Compartmental models have been widely used in infectious disease epidemiological studies. In the SIR model [22] , one of the simplest compartmental models, the population is assigned to three components: S (susceptible), I (infectious), and R (recovered). These variables (S, I, and R) represent the number of people in each compartment. The transition between different compartments is modeled using a set of coupled differential equations. Based on the SIR model, many variants have been proposed in the last decades, including the SIRD (Susceptible-Infectious-Recovered-Deceased) model [18, 9] that considers deceased individuals, and the SEIR (Susceptible-Exposed-Infectious-Recovered) model [25, 59, 26, 27, 44 ] that considers exposed period during which individuals have been infected but are not yet infectious themselves, to name a few. The total population is usually assumed to be fixed in the compartmental models; therefore, it works well when modeling nationwide data. However, in our high-resolution modeling, each county's population is of high variability due to dynamics across the county. Therefore, we use a spatio-temporal model instead to capture the influence of major big cities and neighboring counties without fixing each county's population.

Besides compartmental models, much work has been done on predicting the total number of COVID-19 cases, and deaths without considering the spatial correlation across regions [39, 56, 58] . For example, recent work [7] introduces a regional model based on a self-exciting point process to forecast the total number of infections for multiple countries. Another work [20] provides a state-wise analysis and infections prediction for India's states by considering three growth models, namely, the logistic, the exponential, and the susceptibleinfectious-susceptible models. Machine-learning-based approaches have also been considered in modeling COVID-19 outbreak [3] . Some work [55] attempts to use a neural network to model accumulative case counts for multiple countries. Recurrent neural network-based methods [24, 60] have been applied to model the temporal dynamics of the COVID-19 outbreak. Our approach differs from these studies in two ways: (1) our model provides finer-grained predictions for the cases and deaths; (2) we model the multivariate time series by considering the spatial correlation across regions as well as the correlation with the demographic factors, which is more interpretable than the machine-learning-based methods.

Understanding the COVID-19 outbreak's spatial spread is critical to predicting local outbreaks and developing public health policies during the early stages of COVID-19. However, studies evaluating the spatial spread of the COVID-19 pandemic are scarce or limited [46] . Previous studies have described the spatial spread of severe acute respiratory syndrome (SARS) in Beijing and mainland China [42, 17, 30, 29, 15] using limited or localized data. One study also considered the various connections between a few cities to calculate the spatial association [42] . There is also prior work using the multivariate Hawkes process to model the conditional intensity of new COVID-19 cases and deaths in the U.S. at the county level [12] , without considering the influence of the big cities and other important demographic factors. The work of [4] develops two types of county-level predictive models based on the exponential and the linear model, respectively. It focuses on modeling the dynamics of cumulative death counts. In [31] , the graph neural networks are adopted to capture the spatio-temporal dynamics between various features; However, the lack of interpretability hinders from further understanding the mechanism of the COVID-19 outbreak. There is also a wide array of previous research based on autoregressive models that relate to our work. In [2, 5, 16, 34, 50, 53] , the autoregressive integrated moving average (ARIMA) is used to predict future data for different countries. [41] uses an autoregressive-based time series model to predict the total number of the world's confirmed cases. [51, 11] adopt the autoregressive artificial neural networks to predict the number of accumulative cases in Egypt. A more recent similar article [32] studies the state-wise cases in Pakistan using the vector autoregressive model. However, two significant differences are (1) the spatial resolution of their predictions is much lower than our results; (2) these models do not consider the spatial correlation between places in the vicinity and cities served as major transportation hubs.

There are also various efforts studying impacts from other aspects, such as temperature, humidity [52, 47] , age, gender [15] , and travel restrictions [13, 36] . Here, in our case, N = I × T = 157, 200. Most of these studies are constrained on a relatively small scale because of limited data at the pandemic's early stage.

We have used three comprehensive datasets in this study, including confirmed cases and deaths of COVID-19, community mobility data, and demographic census data. These datasets play an important role in understanding the spatio-temporal correlation of COVID-19 transmission.

We used the dataset from The New York Times [57] , based on state and local health agencies' reports. The data is the product of dozens of journalists working across several time zones to monitor news conferences, analyze data releases, and seek public officials' clarification on how they categorize cases. The data includes two parts: (i) confirmed cases are counts of individuals whose coronavirus infections were confirmed by a laboratory test and reported by a federal, state, territorial, or local government agency. Only tests that detect viral RNA in a sample are considered confirmatory. These are often called molecular or reverse transcription-polymerase chain reaction (RT-PCR) tests; (ii) confirmed deaths are individuals who have died and meet the definition for a confirmed COVID-19 case. Some states reconcile these records with death certificates to remove deaths from their count, where COVID-19 is not listed as the cause of death. These data have removed non-COVID-19 deaths among confirmed cases according to the information released by health departments, i.e., in homicide, suicide, car crashes, or drug overdose. All cases and deaths are counted on the date they are first announced. In practice, we have observed periodic weekly oscillations in daily reported cases and deaths, which could have been caused by testing bias (higher testing rates on certain days of the week). To reduce such bias, we aggregate the number of cases and deaths of each county by week.

Community mobility As global communities respond to COVID-19, we have heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19. The COVID-19 Community Mobility Reports [21] aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports record people's movement by county daily, across various categories such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. The data shows how visitors to (or time spent in) categorized places change compared to the baseline days (in percentage). The negative percentage represents the level of mobility is lower than the baseline, and the positive percentage represents the opposite. A baseline day represents a normal value for that day of the week. The baseline day is the median value from the five weeks from January 3rd to February 6th, 2020. To match the temporal resolution with the COVID-19 data and detrend the weekly pattern, we aggregate each county's mobility data by week. Examples of three categories have been shown in Fig. 2 .

Demographic census Data from the American Community Survey (ACS) [8] , provided by the U.S. Census Bureau, is a comprehensive source for demographic information about the population, age, and economic status in each zip code region in the U.S. Unlike the census data, which takes place every ten years, the ACS is conducted every year. The latest ACS data are available in the year 2018. Some demographic factors help us understand how population distribution affects the spread of disease (by correlating the local socio-economic profile with its confirmed cases and deaths). These factors contain essential information about the development and economic growth of different areas. To match the spatial resolution with the COVID-19 data, we aggregate the zip code regions' demographic data in the same county. We selected two leading factors that affect the spread and the infection of the disease, i.e., total population and the proportion of the elderly with an age of 65 or older [45] .

This section presents our statistical model that captures the spatio-temporal correlation of the spread of COVID-19. We begin with a brief description of the problem setup and notations, then jointly model confirmed cases and deaths as a vector autoregressive process in Section 3.2. The essential notations defined in this section are also summarized in Table 1 . 

Set of all county pairs in the U.S. that i, j are adjacent to each other or one of i, j is a hub.

Case's coefficients depended on past confirmed cases between county i, j ∈ I for τ weeks ago.

Death's coefficients depended on past deaths between county i, j ∈ I for τ weeks ago.

Death's coefficients depended on past confirmed cases between county i, j ∈ I for τ weeks ago. µ k,τ ∈ R Coefficient for mobility category k ∈ K in the past τ -th week w.r.t. the number of cases. ν k,τ ∈ R Coefficient for mobility category k ∈ K in the past τ -th week w.r.t. the number of deaths. υ l ∈ R Coefficient for demographic factor l ∈ L w.r.t. the number of cases. ζ l ∈ R Coefficient for demographic factor l ∈ L w.r.t. the number of deaths.

Consider confirmed cases and deaths of the COVID-19 in N counties and T weeks (recall that we aggregated these numbers by week to reduce bias). Let I = {i = 1, . . . , N } be the set of counties and T = {t = 1, . . . , T } be the set of weeks starting from March 15th, 2020 until January 17th, 2021. We assume there is a set of counties I = {i = 1, . . . , N } ⊂ I playing a significant role in spreading the disease due to their high population density and well-developed transportation network connecting to other major cities in the U.S.. We refer to these counties as hubs, and the selected hubs are marked in Fig. 1 . Denote the number of confirmed cases and deaths in county i ∈ I and week t ∈ T as c i,t ∈ Z + and d i,t ∈ Z + , respectively. In our setting, T = 49, N = 3144, and N = 10. We also consider K mobility categories and L demographic factors as covariates of the model, where K = 6 and L = 2. Let K = {k = 1, . . . , K} be the set of community mobility categories and L = {l = 1, . . . , L} be the set of demographic factors. Denote the mobility score in category k ∈ K for county i ∈ I in week t ∈ T as m i,k,t ∈ R, and denote the data of demographic factor l ∈ L for county i ∈ I as z i,l ∈ R + . Let c t := [c 1,t , . . . , c N,t ] and d t := [d 1,t , . . . , d N,t ] denote the confirmed cases and deaths in week t ∈ T , respectively. Let m k,t := [m 1,k,t , . . . , m N,k,t ] denote the score of community mobility category k ∈ K for all counties i ∈ I in week t ∈ T . Let z l := [z 1,l , . . . , z N,l ] denote the data of demographic factor l ∈ L for all counties i ∈ I.

We consider a linear spatio-temporal autoregressive model where the number of confirmed cases (c t ) and deaths (d t ) is a time series regressed on their previous values and the mobility covariate m k,t and demographic covariate z l . Denote the time window's length that we consider in the past (the memory depth) as p. Based on previous studies [54] , it is known that the COVID-19 virus has an incubation period of around two weeks. Therefore, we choose p = 2 throughout this paper.

Define the augmented observation vector as (which contains both confirmed case and death counts): Then our spatio-temporal model can be written as a vector autoregressive (VAR) process:

where ⊗ is the Kronecker product and

In our model (1), the first-term captures the dependence on past confirmed cases and deaths; the second term captures the influence of past local community mobility; the third term captures the influence of local demography, which is held constant over time. Specifically, B τ and A τ contain the autoregressive coefficients for the number of confirmed cases and deaths, respectively; H τ describes the dependence of the current number of deaths on the number of confirmed cases in τ weeks ago. As an illustrative example shown in Fig. 4 , these three matrices share the same sparse structure, where the entry at (i, j) is zero if county i and county j are not adjacent and none of them is the hub. Formally, the set of adjacency pairs is defined by A = {(i, j) ∈ I : (i, j) is an edge of the graph G}; each node of G denotes a county, and there is an edge between two nodes whenever the corresponding counties are geographically adjacent or one of them is a hub. The µ k,τ , ν k,τ , υ l , and ζ l are four scalar coefficients. To be specific, µ k,τ , ν k,τ represent the coefficients for the local community mobility score in category k in τ weeks ago with respect to the corresponding number of confirmed cases and deaths, respectively. Similarly, υ l , ζ l represent the coefficients for local demographic factor l with respect to the corresponding number of confirmed cases and deaths, respectively. The spatial covariance matrix between the noise at counties i and j is denoted as the (i, j)-th entry of Σ η ; it is a function of their Euclidean distance s ij and is parameterized by η. Some commonly used spatial models include: Gaussian model [37] , Exponential model [19] , and Matérn model [19] . Here we adopt the exponential spatial covariance model Σ η (i, j) = η exp{−ηs ij }, where η is a pre-specified parameter, which controls the rate of spatial decay. In this paper, we specify a reasonable value of the parameter η = 10 3 . We aim to fit the model (1) for confirmed cases and deaths jointly by minimizing the prediction error. Define the set of parameters θ = {Λ, ω, γ} ∈ Θ, where Θ is the set containing all feasible values. For a pre-specified hyper-parameter δ ∈ [0, 1], the loss function is defined as a weighted combination of quadratic loss functions for death and confirmed case residuals:

where ε t,c denotes the confirmed case prediction residual

and ε t,d denotes the death prediction residual

The hyper-parameter δ controls the proportion of death prediction loss. In practical terms, we emphasize the importance of death, and hence we choose δ = 0.9 empirically. The reason is that it is known that the confirmed cases are quite noisy and can depend on the capacity of testings. The parameters θ can be estimated by solving the following optimization with a regularization function:

where λ 1 ≥ 0 is a parameter that controls the importance of the regularization term, and R(θ) is the elastic net type regularization function (with hyper-parameter λ 2 ∈ [0, 1]) given by

where 1 A {x} is the indicator function, i.e., taking the value 1 if x ∈ A otherwise 0; λ 2 is the 1 penalty ratio in the regularization function; α i,j,τ , β i,j,τ , h i,j,τ are the entries of matrices A τ , B τ , H τ , respectively.

Our model's most salient feature is that we consider the underlying spatio-temporal structure between the number of confirmed cases and deaths. If there is no specific structure in coefficient matrices, our methods look on the surface to be a naive linear model but require to solve a large-scale high-dimensional optimization problem, which contains 79,077,916 parameters (variables in the optimization problem) with only 84,888 data points. Instead of solving such complex problems directly, we tackle this challenge by exploiting the sparse spatial structure and only consider the correlation between adjacent counties and hubs, which leads to a significant reduction in the number of parameters (less than 80,000). Besides, the lower triangular structure of Λ τ matrix (including B τ , H τ , and A τ ) captures the causal relationship we believe exists in the confirmed case to the death count, but not the other way around. To be exact, we assume the number of confirmed cases in the past will result in the change of both the confirmed cases and deaths in the future, while the number of deaths only relates to the future's deaths. The regularization term we devised in Section 3.2 also plays a big part in achieving the ideal results. This elastic net-based method linearly combines the lasso and ridge regression penalties on B τ , H τ , and A τ to encourage sparse spatial correlation and stabilize the solution at the same time. The hyper-parameters λ 1 and λ 2 in the regularization term are chosen by 5-fold cross-validation, where the optimal choices are λ 1 ≈ 10 2 and λ 2 ≈ 10 −1 for the fitted model.

Here we solve the optimization problem by gradient descent. To fit the model, we first standardize the data of covariates and feed all the data as a single batch in one iteration, then descend the gradients of the parameters with respect to the loss defined in (2) until the model converges. To perform a one-week ahead prediction, we feed all the data before that week as a single batch in one iteration and follow the same gradient descent procedure described above. The model normally takes about 500 iterations to reach the convergence with (θ) ≈ 1.41 × 10 3 .

Now we report the results of our study. We evaluate the explanatory power of proposed modeling method by performing the in-sample estimation. We also compare our approach regarding the one-week ahead predictive performance against four other external benchmark methods, which are the current state-of-art methods adopted by the Centers for Disease Control and Prevention (CDC) in its national ensemble forecast [10, 49] . We generate the death (and case) prediction for each county in a week by taking the past county-level case and death records, community mobility, and demographic census information as input. The format of the input data is described in Section 2. In addition, we demonstrate the interpretable components of our model by showing the spatio-temporal correlation between the number of COVID-19 cases and other covariates discovered from our fitted model. For the ease of presentation, we only focus on the mainland and do not consider Hawaii, Alaska, and other unincorporated territories of the United States in this paper. Hereinafter, we refer to the proposed spatio-temporal vector autoregressive model as STVA.

To evaluate the effectiveness of our method, we compare the county-level in-sample estimation on the number of deaths. The in-sample estimation is a process of evaluating the model's explanatory capabilities using observed data to see how effective the model is in reproducing the data. The process can be carried out as follows: We first fit the model using the entire data set from March 15th, 2020 to January 17th, 2021, which contains 3,144 counties and 49 weeks in total. The in-sample estimation can then be obtained by feeding the same data into the fitted model and recovering the estimation for deaths according to (1) . We also carry out three ablation studies to further investigate the effectiveness of each model component. To be specific, we consider three variant models: (i) STVA−mobility removes the component of mobility (the second term in (1)); (ii) STVA−census removes the component of demographic census (the third term in (1)); and (iii) STVA−spatial simply removes the spatial correlation (the first term in (1)) by a diagonal matrix. We note that each county can also be regarded as an independent auto-regressive (AR) model without spatial correlation. We report the in-sample estimation of the proposed approach and the ablation models by aggregating the county-level estimated numbers in the same state. As shown in Fig. 5 , we select eight major states with the highest total number of deaths in the U.S. The shaded area indicates the actual number of deaths reported in the COVID-19 data set, and the solid red line indicates the in-sample estimated deaths by our model. We observe that the STVA (solid red lines) can capture the dynamics of true death trajectories better than the other three ablation models (dash lines). We also provide quantitative results for county-level MAE in Table 2 . As we can see, the result indicates that our model STVA generally attains the lowest MAE for all scenarios. There are also significant performance gains if we only focus on predicting the peak week and the most affected region in the U.S. The result suggests that these components are conducive to improving the model's performance. In Appendix A, we also present the in-sample estimation of the confirmed cases for the same eight states.

In addition to the in-sample estimation, we assess the model's predictive power by performing the one-week ahead (out-of-sample) prediction. The prediction procedure withholds the future data from the model estimation, then uses the fitted model to make predictions for the (hold-out) data in the next week.

Here, we compare our model against four benchmark methods adopted by CDC, which represent the current state-of-art for COVID-19 prediction: (1) COVID-19 Mortality Projections for the U.S. States by the University of Texas, Austin (UT) [56, 58] : they introduce a negative-binomial mixed-effects generalized linear model (GLM), i.e., the predictor is a GLM with a logarithm link function.; (2) COVID-19 Cases and Deaths Forecasts by the Los Alamos National Laboratory (LANL) [39] : the model assumes that a fraction of the newly generated cases will die and proposes a statistical model to capture this effect; (3) COVID-19 Modeling by the Northeastern University, Laboratory for the Modeling of Biological and Socio-technical Systems (MOBS) [44] : their team adopts a classic SLIR (Susceptible-Latent-Infectious-Removed)-like compartmentalization scheme for disease progression; (4) COVID-19 Projections for the United States by the Institute for Health Metrics and Evaluation (IHME) [27, 28] : this project considers a deterministic SEIR compartmental framework. In particular, all these four approaches mainly use the state-level records of the number of cases and deaths as the input of their models. At the same time, IHME also takes advantage of several critical driving covariates (pneumonia seasonality, mobility, testing rates, and mask use per capita). The prediction results of these benchmark methods are directly quoted from the CDC's official reports [10] . We only present the results from October 11th, 2020, for one-week ahead death prediction. The statistics before October are inaccurate due to the low testing rate, and the data are insufficient to fit the model. Similar to the in-sample estimation, we report the prediction results for the entire U.S. and eight top states with the highest number of deaths in Fig. 6 . It has shown that the aggregated county-level predictions suggested by the STVA (solid green lines) achieve competitive performance against other mainstream approaches (dash lines). It is worth noting that these four methods only provide state-level forecasting for the number of deaths, which is less challenging than the county-level prediction.

More quantitative results are summarized in Fig. 7 . The heatmaps show the mean absolute error (MAE) of the county-level estimation/prediction within a particular state and at certain weeks. The average MAEs over states and weeks are presented in the vertical line chart on the right and the horizontal line chart on the top. The states are sorted in the ascending order of their MAE from top to bottom. As shown in Fig. 7 , our model significantly outperforms the other ablation models while achieving competitive predictive performances compared to the other widely-adopted state-level approaches. We can also observe that: our model tends to achieve better performance for the states with larger populations, such as Florida, New York, Texas, et cetera; for the deaths, our model has a balanced performance in each state, and the MAE is getting better (smaller) and becoming more stable after the summer surge of the COVID-19 (from June to July).

Our study focuses on exploring the in-sample explanatory content of predetermined factors in our model. We fit the model using the entire data set collected from three data mentioned above sources in Section 2, and interpret the model by examining its fitted coefficients. represents the value of coefficient α i,j,1 in A 1 , where county j is Chicago. Counties in blue indicate their current number of deaths is positively related to its number of deaths in the last week; counties in red are the opposite; counties in white represent no discernable correlation between the two numbers. Coefficients of different hubs show the various spatial pattern in "spreading" or "controlling" the disease.

Spatio-temporal dependencies between cases and deaths The experimental results demonstrate a distinctive underlying spatio-temporal pattern between confirmed cases and deaths of the COVID-19. We first report the coefficients of five representative hubs in Λ 1 in Fig. 8 . The hubs' coefficients in B 1 , H 1 , A 1 reveal their spatial dependencies between each pair of past cases and current cases, past cases and current deaths, and past deaths and current deaths, respectively. As we can see, hubs have a strong "radiating" power on most of the United States regions and contribute a great deal to promote or curb the spread of Apart from analyzing the spatial structure across regions learned by the model, we also study the temporal dependences in the past two weeks for the same hub. In Fig. 9 , we present three typical pairs of comparisons for coefficients between one-week lag and two-week lag: coefficients of Atlanta in A 1 and A 2 , coefficients of Seattle in B 1 and B 2 , and coefficients of Los Angeles in H 1 and H 2 . All three comparisons share one thing in common: coefficients of different time lag have a similar spatial pattern, but the overall coefficients of two-week lag are relatively smaller than corresponding ones of one-week lag. This indicates that the last week has a stronger influence. We also observe that the spatial pattern in A, B, and H are significantly different from each other according to Fig. 8 and Fig. 9 . This observation confirms that each spatial component described by matrices A, B, and H plays a different role in capturing the spatial correlation. For example, we can observe that the visualization of the matrix H always shows large positive coefficients in New England area while the coefficients in matrix B in the same area being typically more negligible or even negative. This may be related to the high death rate in the New England area [43] since H captures the spatial correlation between previous cases and future deaths, and B only concerns the similar correlation between cases. Last but not least, we further investigate the p-values of these spatial coefficients, as finding statistically significant relationships between the prediction and the observation is of great importance to the model evaluation.

Since A 1 plays a key role in predicting future deaths by being connected to the death observations in the past, we take the hubs' coefficients in A 1 as an example. As shown in Fig. 10 , the number of deaths in these hubs is statistically significant to the death counts in other populated regions in the U.S. (New England, the Southeastern United States, and the Western united states). In particular, we find the hubs in the west and the north (Los Angeles, Washington D.C., New York, Seattle, and San Francisco) are statistically significant to a broader area than the hubs in the South (Houston, Dallas, and Miami). Table 3 summarizes the fitted coefficients of local covariates in the model. The first and second rows indicate the corresponding time lag and the category of coefficients, respectively. The first 12 columns correspond to the community mobility, and the last two columns correspond to the demographic factors. Positive coefficients have been put in bold to highlight the positive correlation with cases or deaths. The coefficients can be compared across factors as the covariates are standardized first. As we can see, most of the covariates are statistically significant, with small p-values (< .05) except for the proportion of the elderly population age 65 and older. The positive coefficients are in bold, which indicates a positive correlation between the covariates and the cases or deaths. In particular, we observe that, for the cases, the coefficients of mobility in workplaces, retail, and recreation, transit stations have large positive values (> 9 × 10 2 ), which indicates that the increase of mobility in these areas led to the rapid spread of the COVID-19. However, things are the opposite for the deaths, where the coefficients of mobility in grocery and pharmacies, parks, residential have large positive values (> 2 × 10 2 ). Moreover, the population's coefficient for the cases is significantly larger than the other covariates, and it confirms that population density is the dominant factor in spreading the disease. Last, we have found the proportion of the elderly population is significantly related to the deaths and has no clear connection to the cases.

While still in the development stages, the proposed spatio-temporal model has shown immense promise in modeling and predicting the deaths and confirmed cases of COVID-19 in the United States. Nevertheless, there remain numerous open questions and rooms for improvements. For example, the uncertainty in the count data commonly exists and can affect accuracy. It would be interesting to incorporate the serology data as an additional data source to calibrate our model. To avoid negative output, we may adapt the current problem into a Poisson regression with the log-linear model or apply a Rectified Linear Unit (ReLU) to the output to disallow the negative values. It assumes the response variable x t has a Poisson distribution and assumes the logarithm of its expected value can be modeled by the linear model defined in (1) . In particular, this adaption plays a vital role in predicting states with fewer confirmed cases and deaths, such as Hawaii and Delaware.

Economic activity and the spread of viral diseases: Evidence from high frequency data

A poisson autoregressive model to understand covid-19 contagion dynamics

Covid-19 prediction and detection using deep learning

Curating a COVID-19 data repository and forecasting county-level death counts in the united states

Forecasting the spread of the covid-19 pandemic in saudi arabia using arima prediction model under current public health interventions

Multiscale mobility networks and the spatial spreading of infectious diseases

The challenges of modeling and forecasting the spread of covid-19

Chinese and italian covid-19 outbreaks can be correctly described by a modified sird model. medRxiv

Centers for Disease Control and Prevention. COVID-19 forecasts: Deaths

Theta autoregressive neural network model for covid-19 outbreak predictions. medRxiv

Hawkes process modeling of covid-19 with mobility leading indicators and spatial covariates. medRxiv

The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak

The role of the airline transportation network in the prediction and predictability of global epidemics

Chinese Center for Disease Control Epidemiology Working Group for NCIP Epidemic Response and Prevention. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (covid-19) in china

An approach to measure the death impact of covid-19 in jakarta using autoregressive integrated moving average (arima)

Geographical spread of sars in mainland china

Estimating and simulating a sird model of covid-19 for many countries, states, and cities. Working Paper 27128

Spatial statistics and modeling

Covid-19 in india: Statewise analysis and prediction. JMIR public health and surveillance

Covid-19 community mobility reports

Exact analytical solutions of the susceptible-infectedrecovered (sir) epidemic model and of the sir model with equal death and birth rates

Seroprevalence of antibodies to sars-cov-2 in 10 sites in the united states

Generated time-series prediction data of covid-19' s daily infections in brazil by using recurrent neural networks

The mathematics of infectious diseases

The effectiveness of quarantine of wuhan city against the corona virus disease 2019 (covid-19): A well-mixed seir model analysis

Institute for Health Metrics and Evaluation. COVID-19 projections for the united states

Institute for Health Metrics and Evaluation. Modeling covid-19 scenarios for the united states

Population flow drives spatio-temporal distribution of covid-19 in china

Spatial epidemic dynamics of the covid-19 outbreak in china

Examining covid-19 forecasting using spatio-temporal graph neural networks

Modelling and forecasting of new cases, deaths and recover cases of covid-19 by using vector autoregressive model in pakistan

Evaluating covid-19 lockdown and business-sector-specific reopening policies for three us states

Azim Doguş Tuncer, and Fikret Şinasi Kazancıoglu. Comparative analysis and forecasting of covid-19 cases in various european countries with arima, narnn and lstm approaches

Unmasking the actual covid-19 case count

Effect of non-pharmaceutical interventions to contain covid-19 in china

Spatiotemporal biosurveillance with spatial clusters: control limit approximation and impact of spatial correlation

Emerging study on the transmission of the novel coronavirus (covid-19) from urban perspective: Evidence from china

LANL COVID-19 cases and deaths forecasts

Estimating the prevalence of covid-19 in the united states: Three complementary approaches. medRxiv

Time series modelling to forecast the confirmed and recovered cases of covid-19. Travel medicine and infectious disease

Understanding the spatial diffusion process of severe acute respiratory syndrome in beijing

COVID-19 deaths per 100k

Laboratory for the Modeling of Biological and Socio-technical Systems. COVID-19 modeling

Case-fatality rate and characteristics of patients dying in relation to covid-19 in italy

Real-time forecasting of the covid-19 outbreak in chinese provinces: Machine learning approach using novel digital data and estimates from mechanistic models

The role of environmental factors on transmission rates of the covid-19 outbreak: An initial assessment in two spatial scales

The COVID Tracking Project

Spatial prediction of covid-19 epidemic using arima techniques in india. Modeling earth systems and environment

Forecasting the prevalence of covid-19 outbreak in egypt using nonlinear autoregressive artificial neural networks

Temperature and latitude analysis to predict potential spread and seasonality for covid-19

Prediction of the covid-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (arima) model. JMIR public health and surveillance

A review of coronavirus disease-2019 (covid-19)

Forecasting of covid-19 cases based on prediction using artificial neural network curve fitting technique

The University of Texas COVID-19 Modeling Consortium. COVID-19 mortality projections for US states

The New York Times. We're sharing coronavirus case data for every u.s. county

Projections for first-wave covid-19 deaths across the us using social-distancing measures derived from mobile phones. medRxiv

Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions

How well can we forecast the covid-19 pandemic with curve fitting and recurrent neural networks? medRxiv