key: cord-0765084-q9i6vkr9
authors: Cruz-Cano, Raul; Ma, Tianzhou; Yu, Yifan; Lee, Minha; Liu, Hongjie
title: Forecasting COVID-19 Cases Based on Social Distancing in Maryland, USA: A Time–Series Approach
date: 2021-05-19
journal: Disaster medicine and public health preparedness
DOI: 10.1017/dmp.2021.153
sha: 68f425d0cda5000f6cd6a93ce9476345b6743935
doc_id: 765084
cord_uid: q9i6vkr9

OBJECTIVE: Our objective is to forecast the number of coronavirus disease 2019 (COVID-19) cases in the state of Maryland, United States, using transfer function time series (TS) models based on a Social Distancing Index (SDI) and determine how their parameters relate to the pandemic mechanics. METHODS: A moving window of 2 mo was used to train the transfer function TS model that was then tested on the next week data. After accounting for a secular trend and weekly cycle of the SDI, a high correlation was documented between it and the daily caseload 9 days later. Similar patterns were also observed on the daily COVID-19 cases and incorporated in our models. RESULTS: In most cases, the proposed models provide a reasonable performance that was, on average, moderately better than that delivered by TS models based only on previous observations. The model coefficients associated with the SDI were statistically significant for most of the training/test sets. CONCLUSIONS: Our proposed models that incorporate SDI can forecast the number of COVID-19 cases in a region. Their parameters have real-life interpretations and, hence, can help understand the inner workings of the epidemic. The methods detailed here can help local health governments and other agencies adjust their response to the epidemic.

The rapid, global spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused hundreds of thousands of deaths. Although social distancing is considered a key measure to reduce the spread of the virus, 1 the exact impact of day-to-day social distancing on viral spread remains unclear.

Since the report of the first confirmed case of coronavirus disease 2019 (COVID-19) in Maryland on March 5, 2020, 2 more than 334,000 cases and 6,700 deaths have been reported. 3 On March 16, 2020, the state government implemented restrictions on gatherings and closure of educational facilities and on March 30, 2020, a stay at home order was imposed. The Maryland Transportation Institute (MTI) implemented the Social Distancing Index (SDI) to measure the extent residents and visitors are practicing social distancing. 3 To date, studies have evaluated the efficacy of social distancing strategies to reduce the magnitude of the epidemic, 4 but models that accurately forecast daily caseloads based on social mobility patterns are yet to be explored.

In this article, we proposed to use a sequence of time series (TS) models to forecast and further understand the relationship between social distancing and the COVID-19 daily caseload. Our analysis of transfer function TS models can accomplish this objective because they present a dependable way to analyze data in which the current value of variable, eg, daily COVID-19 cases in Maryland depends on its previous values and those of other predictors, such as SDI. Other TS models have been used to accurately analyze this and previous pandemics. 5 However, to our knowledge, this is the first attempt to develop a TS model that includes a social distance measure to predict daily COVID-19 cases. The magnitude of the COVID-19 epidemic makes it worthwhile to keep exploring any possible way to improve the models that can predict its behavior.

Data used in this study covered the timeframe from March 5, the day the first COVID-19 case was reported in Maryland, to June 1. On May 26, demonstrations against police violence started in Minnesota and spread all over the country in the next days. There exists anecdotal evidence that, due to use of preventive measures, such as face masks, these massive gatherings changed the relationship between social distancing and COVID-19 cases 6 ; hence, we decided not to include information on or beyond this date in the creation of the models.

The variables included in our study are: In TS, a general transfer function can be used to describe the relationship between an input and an output series. 7 We propose to use a transfer function TS model with the following 3-step procedure to relate the input daily SDI to the output COVID-19 cases series 7 while accounting for the secular trend and weekly cycles of the exposure and outcome variables:

• Step 1: Fit an autoregressive integrated moving average (ARIMA) model to the independent variable SDI. This step helps to find patterns that need to be removed from the independent variables before we study its relationship with the outcome. An ARIMA model is defined by its parameters (p,d,q) where p represents the order of autoregression, and q the order of the moving average. The parameter d is the degree of difference, eg, d = 1 means that, instead of the original SDI series, we would use (SDI t -SDI t-1 ). We used plots of autocorrelation function (ACF) and partial ACF (PACF) to determine the values of these parameters, which is well-known procedure applied in many published works. 5,7 • Step 2: Remove the patterns from the input series SDI and compute the cross-correlation with the daily and imported COVID-19 cases. This step helps to determine the pure delay in the system s after removing the SDI patterns discovered in the previous step (pre-whiten). We defined s as the largest cross-correlation, which occurred at delay ≥ 0 days. • Step 3: Compute transfer function and fit it with noise model. The study of the cross-correlation graph of the pre-whiten input and outcome series can help to determine the terms that are need in the numerator and denominator of the transfer function model. This is a complex matter and beyond the scope of this work, but in general a simpler model is recommendable. 7 The analysis of residuals produced by the transfer function TS model might indicate the need to include more terms to improve its fit. 7

In our particular case of study, the pre-whiten SDI series/Daily COVID-19 cases cross-correlation graph presented nonzero crosscorrelations with some of them decaying exponentially, indicating the need to add terms in the numerator and denominator of the transfer function model. 7 A parsimonious model was initially chosen for the transfer function 7 :

where B is the backward shift operator and r is the differencing operation.

We use a sliding window of 60 day to train the model and estimate the parameters C, 0 , θ 1 , and δ in the transfer function, and test the forecasting results in the week immediately following. For the test week, estimated values of SDI were used as the input in the transfer function to forecast the daily cases, ensuring that information unavailable in a real-life case scenario would not be used to produce the results for the test week. Given the time limit discussed previously in this section, the first training window goes from March 5 to May 4 with a test week that encompasses May 5 to May 12. The last training window goes from March 26 to May 25 with a test window that goes from May 26 to June 3. The performance of the procedure for the test week was assessed using the mean absolute percentage error (MAPE) defined as

cases i Àb cases i cases i . All analysis is performed using SAS software 9.4.

Based on the full data, we observed that the autocorrelations depicted in the ACF plot decayed slowly, indicating the need to differentiate this series. The ACF of the differentiated SDI series tailed off at lags 7k (k = 1, 2 : : : ), while PACF cutoff at lag 7. These observations lead to a simple time-series model composed of a differencing term (d = 1) and autoregressive (AR) term p = (1,7) that was fitted for the SDI data. The ACF and PACF plots after removing this AR term from the differentiated series showed no significant auto-correlations, indicating that this ARIMA model adequately fitted the SDI data and, hence, could be used to pre-whiten it. The analysis of the cross-correlation of the pre-whiten SDI with the daily cases led to a delay s ¼ 9 d (crosscorrelation = −.23). Examination of residuals from the initial transfer function model in forecasting COVID-19 case indicated that an AR term p = (1,7) of the daily COVID-19 cases was also needed in the final transfer function TS model. After adding this autoregressive term to the transfer function model no more significant correlations appeared in the ACF and PACF, indicating that the resulting model adequately captured the relationship between SDI and the number of daily COVID-19 cases in Maryland in this time period. We then proceeded to estimate the coefficients of this model for each of the training windows described above. To benchmark the performance, we compared our proposed transfer function model with a simple ARIMA model for daily cases forecasting (ie, excluded the SDI) with d = 1 and autoregressive term p = (1,7).

Notice at least 1 of the parameter estimates associated with the SDI (C, θ 1 , or δ) are statistically significant with a P-value < 0.05 (Table 1) , except in the first week and the week from March 23 to May 22. Hence, the results in Table 1 indicate that including the SDI as in input variable with the appropriate delay in the models can be an important predictor of daily COVID-19 cases. Focusing on the statistically significant results, the positive values for the autoregressive terms indicate that past values of cases lead to a larger number of infections, while the negative values for C paired with the positive estimates for the δ show that larger SDI leads to fewer cases with a 9-d delay.

The MAPE values for test weeks that start up until May 24 vary between 12.2% and 20.1% as seen in Table 2 hence fall or are around what would be considered good forecasts. 8 The unevenness of the performance might be attributed to external factors that influence the number of reported cases in a given day during those earlier days of the pandemic, eg, local weather and scarce availability of tests. The MAPE of the simple ARIMA models that exclude the SDI and rely exclusively on the secular trend and weekly cycle 2 R. Cruz-Cano et al.

of number of COVID-19 cases in this period of time showed even more volatility varying between 9.5% and 37.8% and is on average 1.94% worse than those of the proposed transfer function models. The performance of both the transfer function and classic ARIMA models for the week composed completely of days starting on May 25 lead to MAPE of 41.8% and 42.2%, respectively, hinting that this point in time might mark a significant departure on how social distancing relates to COVID-19 cases.

The required AR terms p = (1,7) suggest that the SDI and daily count variables are influenced by their observed or estimated values on the previous day (secular trend) and a week prior (weekly cycle) supporting what has been observed in previous studies, 9 while the delay of 9 d for the COVID-19 cases is within the range of the number of days that symptoms take to appear, 10 indicating not only that our proposed transfer function models provide an adequate prediction performance but also their characteristics of correspond to patterns seen in the pandemic. The degradation of the models performance after the start of the 2020 national protests attest to the limitation of the current version of the models and point toward the need to redo the process to obtain the optimal values for the (p,d,q) parameters and the delay s instead of just recalculating the coefficients of the models. The study of the P-values seen in Table 1 help to reinforce conclusions that have been drawn about the COVID-19 pandemic previously, namely that having a large number of cases in a community leads to even more 

infections and that the decrease of social distancing behavior among the members of a group are associated with an increase in the number of positive cases few days later. A limitation of this study is its single focus on the SDI; future work might include evaluation if the conclusions reach in this manuscript hold true for other measure of social distancing, such as Unacast Social Distancing Scorecard.

Although the models described in this report were optimized for the epidemic in Maryland, the steps described here can be used to develop models to forecast the number of COVID-19 cases in a other regions several days in advance. Parameters used in this transfer function model will change according to region and time because of modifications to social distancing regulations and other factors (eg, contact tracing) but the transfer functions can include other independent variables in addition to SDI, hence providing useful information in the debate of economy resume vs pandemic control for this and future pandemics. 

Scientific and ethical basis for social-distancing interventions against COVID-19

An ongoing repository of data on coronavirus cases and deaths in the

An interactive COVID-19 mobility impact and social distancing analysis platform

The effect of state-level stay-athome orders on COVID-19 infection rates

Estimation of COVID-19 prevalence in Italy

Protests probably didn't lead to coronavirus spikes, but it's hard to know for sure

SAS for Forecasting Time Series

Industrial and Business Forecasting Methods. London: Butterworth Scientific

A seven-day cycle in COVID-19 infection and mortality rates: are inter-generational social interactions on the weekends killing susceptible people? medRxiv

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia