key: cord-1001590-o2dnynhd
authors: Parker, Dylan; Pianykh, Oleg
title: Mobility-Guided Estimation of Covid-19 Transmission Rates
date: 2021-01-08
journal: Am J Epidemiol
DOI: 10.1093/aje/kwab001
sha: ad94fb0429d28e61570b626d1100a674d18879ad
doc_id: 1001590
cord_uid: o2dnynhd

It is of critical importance to estimate changing transmission rates and their dependence on population mobility. A common approach to this problem involves fitting daily transmission rates using a Susceptive Exposed Infected Recovered (SEIR) model (regularizing them to avoid overfitting), and then computing the relationship between the estimated transmission rate and mobility. Unfortunately, there are often several, very different transmission rate trajectories that can fit the reported cases well, meaning that the choice of regularization determines the final solution (and thus the mobility-transmission rate relationship) selected by the SEIR model. Moreover, the classical approaches to regularization—penalizing the derivative of the transmission rate trajectory—do not correspond to realistic properties of pandemic spread. Consequently, models fit using derivative-based regularization are often biased toward underestimating the current transmission rate and future deaths. In this work, we propose mobility-driven regularization of the SEIR transmission rate trajectory. This method rectifies the artificial regularization problem, produces more accurate and unbiased forecasts of future deaths, and estimates a highly interpretable relationship between mobility and the transmission rate. Mobility data for this analysis was collected by Safegraph (San Francisco, CA) from major US cities between March and August 2020.

defined as the number of unique visits to points of interest in a county, as measured by cellphone traffic. [1] The most natural way to estimate the r(t)-to-m(t) relationship should consist of two principal steps: 1) estimating r(t) from an SEIR model fit to death data [2] [3] ; 2) comparing the estimated r(t) to the observed m(t). This appears to be the approach taken by several notable pandemic models [4] [5] . Unfortunately, the significant noise in the data often results in several model solutions that fit the same data well, but imply starkly different r(t)-to-m(t) relationships ( Figure   1 ). Moreover, these solutions are highly dependent on the (arbitrary) choice of r(t) regularization, which is necessary to avoid model overfitting.

This makes it impossible to discover an accurate, stable mobility-transmission rate relationship by first estimating the transmission rate and then attempting to align it with mobility.

Instead, it is necessary to determine whether there exists a transmission rate trajectory that both fits the data and aligns with the observed mobility trend. In this work, we propose a model that estimates r(t) as a function of m(t), which effectively regularizes r(t) using mobility data instead of artificial, derivative-based constraints. Consequently, the model produces a more accurate, unbiased, and stable solution, even in the presence of the significant variance observed in death data.

Aggregated and anonymized mobility data was obtained from Safegraph through an academic partnership program. We elected to fit the model to county-level death data, 

Below, we provide the specifications for the SEIR model used in the subsequent sections.

In addition to the typical SEIR variables, this model includes the following features: 1) asymptomatic infections [6] and infectious incubation periods [7] both of which are assumed to be 56% percent [8] as infectious as symptomatic infectious; 2) a hospitalization period, which is modeled as an exponential distribution with a mean of 8 days [9] ; and 3) a critical care period, which is also modeled as an exponential distribution with a mean of 8 days [9] . Since the model described in the next section is fit to death data, the most important trajectory within this model is Exposed ( ), Infected ( , Hospitalized ( ), Critical Care ( ), Death ( ) (see Table 1 for all model notations). Note that the use of death data does not impact the SEIR model's fundamental assumption that infections drive infections. Deaths are simply used as a delayed signal for past infections. The model assumes deaths occur after infections according to an exponential distribution with a mean of approximately 12 days. Furthermore, incubation periods are assumed to follow an exponential distribution with a mean of 6 days. Thus, on average, mobility on day t is related to deaths on day t+18. The only difference between this approach and a more typical infection-based fitting approach is that model error is calculated with respect to forecasted and observed deaths, rather than forecasted and observed infections. We made the decision to use deaths based on the observation that COVID infections depended heavily on testing rates and capacities, which makes infections counts far less reliable. Moreover, there was no clear way to correct for this source of bias. Therefore, we chose to fit to death data because we believed that this was the most consistent and widely available metric for measuring the extent of the epidemic. The University of Washington's IHME has written in greater depth about the relative merit of death data. [10] (

One would expect that the relationship between r(t) and m(t) could change over time as social distancing policies are implemented and relaxed. For instance, even if the same number of people visit the grocery store as did before the pandemic, they are less likely to become infected due to masks, shorter dwell times, lower store capacity, etc. Our model accounts for this by modeling an r(t)-to-m(t) relationship that is permitted to evolve over time. Let be the initial transmission rate and ( be the r(t)-to-m(t) factor, determining how r(t) scales with mobility over time. Our model assumes that the percent change in r(t) is equal to the product of ( and the percent change in m(t) :

Changes in ( are quite gradual, as social distancing behavior is unlikely to change from day to day (i.e. not the number of people coming in contact with each other, but how people behave when they do come in contact). Therefore, it is sufficient to estimate ( on a weekly basis. This dramatically reduces the computation time to fit the model and serves as a natural regularizer.

In addition to fitting ( on a weekly basis, we regularize it by penalizing its first derivative. This encourages the model to find as consistent a m(t)-to-r(t) relationship as possible. This is more realistic than the corresponding assumption used by most r(t) estimation models since, while we expect to observe dramatic changes in r(t), we do not expect to observe dramatic

changes in the relationship between m(t) and r(t). Together, these penalties produce the following minimization problem. Let D(t) be the number of observed deaths on day t and SEIR(t,

( ) be the number of model-forecasted deaths on day t, where ( is the initial r(t). Then, we find the optimal model parameters by minimizing the following model error E:

where is the regularization factor. Without any regularization, the model still produces plausible, stable m(t)-to-r(t) relationships for 68% of the counties we processed. However, there are a few counties for which the model terminates at a local minimum corresponding to a highly unstable m(t)-to-r(t) relationship or even diverges. Our numerical experiments found that any consistently prevents the model from diverging. Therefore, to avoid inserting additional bias into the model through stronger regularization, we elected to set to its minimum effective value of 0.4. The addition of does not provide a theoretical guarantee that the resulting solution is a global minimum. However, it does insert significant convexity into the objective function, which was sufficient to ensure the optimizer avoided local minima for the 2,250 models fit during our experiments. We also achieved faster and more stable results by setting a prior on the initial reproduction number . Most estimates [11] have placed the standard reproduction number between 2.5 and 3, so we set a normal prior with a mean of 2.8 and a standard deviation of 1. Enforcing this prior is equivalent to adding a third term to the objective function:

where is the average length of the infectious period (which is necessary to convert between the transmission rate and the reproduction number). In the following experiments, the models are fit by minimizing this error function with Scipy's L-BFGS numerical minimization algorithm.

When forecasting future deaths, our models calculated future transmission rates using the most recent estimate for ( and the most recent 7-day moving average of m(t).

When fit to US COVID-19 death data from March through May and used to forecast deaths over the next 5 weeks, the mobility model overestimated deaths by 5.0% among the 25 counties

with the highest death totals. This compared favorably with our highest-accuracy non-mobility model (where r(t) was permitted to vary as needed to fit the death data with regularization of the first derivative), which underestimated deaths by 32.5% over the same period. Figure 2 compares the non-mobility model and our mobility model relative errors for each county, where relative error is defined as the difference between forecasted and observed deaths, divided by the total number of deaths in the county. As one can see, while the non-mobility model systematically underestimated future deaths, the mobility model appeared to be nearly unbiased; its average error was close to zero. The fitted results of our mobility model also appear to be quite stable, meaning that the model only infrequently revised its estimate of the relationship between m(t) and r(t). Among the 25 counties with the highest death totals, the mean change in ( for models fit daily between June 1 and June 22 was 1.83%. Figure 4 demonstrates the stability of the LA county fitted results over that period, including forecasts for the following month.

During and immediately following lockdowns in the 25 counties with the most deaths, estimates of the mobility-transmission factor clustered closely around 0.89, meaning that an 1% decrease in mobility was associated with a 0.89% decrease in the transmission rate ( Figure 5 ). As expected, social distancing procedures appeared to decrease the sensitivity of the transmission rate to mobility to a median value of 0.67 in week 3 post-shutdown. However, the mobilitytransmission factor appeared to increase in subsequent weeks as social distancing was relaxed; it was estimated as 0.72 in week 5, 0.76 in week 7, and 0.78 in week 9 post-shutdown. The dispersion in mobility-transmission factors also increased during this period, suggesting that some counties were more effective than others in reopening portions of their economies without dramatically increasing r(t). Several counties, primarily in Massachusetts, appeared to decrease r(t) even as m(t) increased. This could be due to improved contact tracing, isolation, and adherence to social distancing and mask recommendations.

We have proposed a model that uses mobility data to forecast future deaths and estimate the relationship between mobility and the transmission rate. The model finds a stable association

between m(t) and r(t) that is conserved across several major counties, and it outperforms nonmobility models when forecasting future deaths. Moreover, the fitted trends in ( suggest that public health interventions (e.g. contact tracing, social distancing, mask wearing) initially reduced the sensitivity of the transmission rate to changes in mobility in some counties, but ( decrease (corresponding to the lockdown) followed by a slow rise (corresponding to reopening).

A model that penalizes either the first or second r(t) derivative would prefer the first result to the second. The mobility-based model, described in subsequent sections, produces the second trajectory. While the second solution reflects the likely scenario that r(t) increased in the summer of 2020 as the US economy reopened, the first r(t) suggests that the economic reopening did not influence the transmission rate. 

Safegraph COVID-19 Data Consortium

Advanced Methods for Data Analysis: Smoothing Splines

Models of Infectious Disease

COVID-19 Projections Release Notes

Small-Area Projections of COVID-19 Transmission in the United States

Estimating the undetected infections in the Covid-19 outbreak by harnessing capture-recapture methods

Clinical Questions Regarding COVID-19

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand

Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European Economic Area countries

Preliminary estimates of the reproduction number of the coronavirus disease (COVID-19) outbreak in Republic of Korea and Italy by 5

The Incubation

From Publicly Reported Confirmed Cases: Estimation and Application

COVID Tracking Project Daily State Data

COVID Tracking Project Historical State Data

Mortality rates of patients with COVID-19 in the intensive care unit: a systematic review of the emerging literature

Relative infectivity of undetected and pre-symptomatic infections. Set to 0.54. Rate of exposed period. Set to 0.2. Proportion of cases undetected

Rate of undetected infection period

Rate of detected infection period

Proportion of detected infections hospitalized

Rate of hospitalization period

Proportion of hospitalized patients admitted to critical care

Rate of critical care period

Death rate among critical care patients