key: cord-0894693-79dzsxpi
authors: Bhouri, M. A.; Costabal, F. S.; Wang, H.; Linka, K.; Peirlinck, M.; Kuhl, E.; Perdikaris, P.
title: COVID-19 dynamics across the US: A deep learning study of human mobility and social behavior
date: 2020-09-23
journal: nan
DOI: 10.1101/2020.09.20.20198432
sha: 856c4b210d9df4e37268535712908ec0ddc3234a
doc_id: 894693
cord_uid: 79dzsxpi

This paper presents a deep learning framework for epidemiology system identification from noisy and sparse observations with quantified uncertainty. The proposed approach employs an ensemble of deep neural networks to infer the time-dependent reproduction number of an infectious disease by formulating a tensor-based multi-step loss function that allows us to efficiently calibrate the model on multiple observed trajectories. The method is applied to a mobility and social behavior-based SEIR model of COVID-19 spread. The model is trained on Google and Unacast mobility data spanning a period of 66 days, and is able to yield accurate future forecasts of COVID-19 spread in 203 US counties within a time-window of 15 days. Strikingly, a sensitivity analysis that assesses the importance of different mobility and social behavior parameters reveals that attendance of close places, including workplaces, residential, and retail and recreational locations, has the largest impact on the basic reproduction number. The model enables us to rapidly probe and quantify the effects of government interventions, such as lock-down and re-opening strategies. Taken together, the proposed framework provides a robust workflow for data-driven epidemiology model discovery under uncertainty and produces probabilistic forecasts for the evolution of a pandemic that can judiciously inform policy and decision making. All codes and data accompanying this manuscript are available at https://github.com/PredictiveIntelligenceLab/DeepCOVID19.

1 Introduction probabilistic characterization of the estimated parameters is required such that extrapolations in time can be provided 74 with quantified uncertainty. 75 From a modeling perspective, the main novelty of this work stems from utilizing mobility and social behavior trend 76 data to model the temporal evolution of the COVID-19 infectious rate -the probability of transmitting disease between 77 a susceptible and an infectious individual (Figure 1) . From a methodological standpoint, this work generalizes the 78 multi-step neural network method of Raissi et al. [26] to enable gradient-based optimization of unknown epidemi-79 ology model parameters with latent variables. This is achieved by effectively back-propagating the influence of the 80 unobserved variables through the disease spread dynamics with an appropriate tensorization of the loss function. This 81 key step enables a computationally efficient treatment of multiple temporal trajectories, as well as the concurrent train-82 ing of multiple neural networks. The latter, allows us to generate a statistical ensemble over the predicted evolution 83 of the underlying disease spread dynamics, which enables future forecasting with quantified uncertainty. In summary, 84 our specific contributions can be organized as follows: 85 • We extend the multi-step neural network method of Raissi et. al.

[26] to enable dynamical system's parameters in-86 ference with unobserved latent variables. Moreover, we tensorize the loss function formulation in order to perform 87 the parameters inference from multiple trajectories of different counties, while having a small memory footprint 88 in terms of the size of the associated computational graph. Hence, the proposed approach is computationally effi-89 cient and enables training of multiple neural networks, which provides an uncertainty quantification estimate of the Our findings put forth a novel, flexible and robust workflow for data-driven epidemiology model discovery that can 102 potentially help health authorities take a more quantitative and judicious approach to policy making in order to slow 103 down COVID-19 propagation. We demonstrate that well-calibrated epidemiology models can indeed return sensible 104 future forecasts. Such capability can directly assist with optimizing resource allocation for more precise diagnostics 105 and treatment, as well as yield a better understanding of the disease spread dynamics. Taken together, this can be an-106 other tool in our arsenal to help us better prepare for COVID-19 and potential future pandemics, and provide additional 107 quantitative tools to help us minimize their spread. 108 This paper is organized as follows. Section 2.1 presents the mobility and social behavior-based SEIR epidemiology 109 model considered to study the spread of COVID-19. Section 2.2 details the proposed computational method for infer-110 ence of the mobility and social behavior-based SEIR model parameters and the corresponding technical ingredients.

In section 3, the effectiveness of the proposed method is demonstrated by testing it on real mobility and infections 112 data from 203 US counties and obtaining reliable future forecasts for a period of 15 days with quantified uncertainty.

Within the same section, we present a sensitivity analysis to assess the influence of the different mobility and social 114 behavior parameters considered in the spread of the disease. Finally, in section 4 we summarize our key findings, 115 discuss the limitations of the proposed approach, and carve out directions for future investigation. 3 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. We consider a mobility and social behavior-based SEIR model detailed as follows:

where S is the fraction of susceptible population, E the fraction of exposed population including individuals who have 120 been infected but are not yet infectious themselves during the incubation period, I the fraction of infectious population,

R the fraction of removed population that accounts for recovered and deceased individuals, and C(t) ≡ R(t) + I(t) 122 is the fraction of cumulative cases, which is the only variable with available data over time. Note that we consider an 123 SEIR model with variables normalized by the number of population such that the resulting model (1) is universal for 124 all regions and independent of the number of their inhabitants.

Since the spread of an infectious disease that is transmitted from human to human depends on interpersonal interac-126 tions, we model the susceptible to infectious transition rate β in equation (1) as a function of mobility and social behav-127 ior parameters denoted as m ∈ R Nm , which contains the mobility and social behavior parameters m 1 (t), . . . , m Nm (t).

This enables determining the evolution of the basic reproduction number, R 0 (t) ≡ β(m(t)) γ , a key paremeter to char-129 acterize the evolution of an epidemic.

The proposed model is trained on data provided by Google for mobility change from baseline [18] (parameters 131 m 1 , . . . , m 6 as discussed below), along with the data provided by Unacast [20] for social behavior trends (param-132 eters m 7 , . . . , m 9 as detailed below) for all 203 US counties considered in this work. Specifically, we consider a total 133 of N m = 9 mobility and social behavior parameters corresponding to: In this work, γ and δ are assumed as given and equal to 1/6.5 [27, 28] and 1/2.5 [27, 11] respectively for COVID -19, 147 and the task is to learn the dynamics of β(t), and, consequently, the evolution of the solution to the SEIR model of 148 equation (1). The learning task can then be formulated as follows.

Given γ, δ and a data-set D for N c different regions containing: (i) the initial conditions of the variable I(t = t 0 ) for 150 each region, (ii) the time trajectories of the observed variable C(t) for each region, and (iii) the time trajectories of the 151 observed mobility and social behavior trends m(t) for each region, infer the dependence of β(t) ≡ β(m(t)) and the 152 initial fractions of the exposed population E(t = t 0 ) for all the different regions.

The observed data-set D can be expressed as follows:

where superscripts refer to the different regions considered. In order to learn the mapping β ≡ β(m(t)), β is modeled 155 with a long-short term memory (LSTM) [29] neural network of the form

where θ N N refers to the weights and bias parameters defining the LSTM network. Since we are modeling the spread 157 of an infectious disease with an incubation period, we utilize the lag structure of the LSTM network to model β as a 158 5 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint function of the mobility and social behavior parameters over a period of time of τ days. The forward pass of LSTM 159 neural network considered in this work is detailed in appendix A. 160 Given the available dataset D, we formulate a multi-step loss function in order to infer the learnable parameters θ given 161 in equation (4), which include both the LSTM parameters θ N N as well as the latent initial fractions of the exposed 162 population for all N c different regions considered,

To formulate the multi-step loss function, we follow the approach put forth by Raissi et al. 

where dt is the time-step size corresponding to the period of the infections number update, and taken equal to 1 day in 169 this work based on the available data. Then, given the estimated I(t) from equation (5) and the parameters θ, we can 170 computeŜ(t), thenÊ(t) and finallyÎ(t) using the remaining equations of the SEIR model (1). Using the trapezoidal 171 rule for instance, such discretization for region j gives:

is the LSTM prediction of β for 173 region j at time instance t i . Note that E j (t 0 ) appearing in the initial conditions in equations (6) is part of the learnable 174 parameters θ, as described in equation (4). Notice that, given the dataset D and the parameters θ, one can estimate 175Ŝ (t),Ê(t), andÎ(t) in that order. Finally, the loss function for a single neural network is expressed as the difference 176 between the variable I(t) estimated only from the available data and the estimated variableÎ(t) which depends on the 177 learnable parameters

One crucial point in terms of implementation consists of the possibility of expressing the finite difference equation of 179 (5), and each of the finite difference equations of (6), as tensors of shape N c × (N t − 1). As a consequence, the loss 180 L given in equation (7) can be expressed as the sum of the elements of a tensor of shape N c × (N t − 1) ensuring 181 a minimalist computational graph, which speeds up gradient back-propagation. We emphasize that the proposed 182 methodology is independent of the chosen finite difference scheme, and the trapezoidal rule is chosen here for sake 183 of presentation and in coherence with the finite difference scheme adopted in this work as justified in appendix A.

This computational strategy allows us to consider multiple trajectories simultaneously (accounting for the different 185 regions considered in the data), instead of performing the learning task using observations from a single or a handful 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint

Optimization algorithm Adam Learning rate 10 −3 Number of iterations 2 × 10 4 County batch size 60 Table 1 : Hyper-parameter settings employed across all numerical studies.

All code and data presented in this section will be made publicly available at https://github.com/ 192 PredictiveIntelligenceLab/DeepCOVID19. The minimization of equation (7) is carried out using stochastic 3.1 Training data and pre-processing 196 We remove the counties containing missing mobility or social behavior parameters for more than 3 consecutive days 197 or 20 days in total over the considered time period, and we use linear interpolation for missing data for 3 or less 198 consecutive days. We also consider the USAFacts processed infections data for all US counties [32]. The two datasets 199 are cross-matched together such that the considered counties satisfy the following criteria:

• the infection rate C(t) is at least equal to 0.2% after t = 81 days of the spread of the disease,

• the mobility and social behavior data for the τ days prior to the first day of infections is available.

For each county, the first day of infections is defined as the first day for which there are at least 10 infections. These and June 12, 2020. The total population of these counties is 150 million, which represents 46% of total US population. The proposed framework is trained on the collected data corresponding to the first 66 days. For the remaining 15 208 days, only the mobility and social behavior data are used in order to extrapolate the fraction of infected population. 209 We train an ensemble of 100 LSTM networks with randomized parameter initializations. The total training time takes 210 14.7 minutes on a single NVIDIA Tesla P100 GPU. Once trained, the model provides estimates of the cumulative 211 cases fraction with a relative error equal to 4.3% on the training data (for the first 66 days), and a relative error equal 212 to 6.6% on the testing data, corresponding to the extrapolation carried out for the remaining 15 days of the data-set. would drive the model to better approximate counties with lower fraction of cumulative cases due to the importance 232 weight introduced by the corresponding small denominator I j (t i ). Since accurately modeling counties with higher 233 disease propagation, and hence higher fraction of cumulative cases, is more crucial than focusing on counties with 234 lower infection rates, we opted for the non-normalized loss function. Note that considering the normalized loss and 235 adding a hyper-parameter within the denominator may partially solve the issue and slightly remedy the disparity in 236 accuracy, however the improvement observed was not significant and it would make the proposed approach a problem-237 dependent method since the optimal hyper-parameter introduced would differ depending on the data-set considered.

New York state has the highest number of counties with a fraction of infected population above 1% after 81 days of the 239 spread of the virus. Figure 5 shows the predictions obtained for the evolution of the fraction of infected population for Figure 8 . 261 We also perform a local sensitivity analysis as follows. We increase each of the mobility and social behavior parameters 262 by 10% for the last 21 days (each at a time), and we determine the corresponding prediction for the susceptible to 263 infectious transition rate β. We then compare the obtained value for β for the last day to the actual one obtained 264 with the unchanged data. Finally, we pick the parameter that corresponds to the highest change in β. This result, 265 shown in Figure 9 , allows us to locally identify the most sensitive parameter among the mobility and social behavior 266 ones considered for each of the 203 counties studied. Interestingly, our results suggest that the spread of COVID-19 267 among a population of over 60 million people is most sensitive to the closure and reopening of workplaces, followed 268 by changes in visits to residential places. 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint Figure 6 : Evolution of the fraction of cumulative cases C(t) and the basic reproduction number R 0 (t) for representative counties across the US. The red and blue horizontal dashed lines indicate the dates corresponding to the large crowd gathering ban and stay at home order issued by each county. The black horizontal dashed line delineates the data used for training the model (days 0-65), and the predicted model forecasts (days 66-80).

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 12 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. incomplete and imperfect data points; a task that is often viewed as art and has been the source of scientific controversy 297 [37] . In this work, we aspire to develop state-of-the-art machine learning algorithms to accurately determine the 298 dynamic parameters characterizing the evolution of an epidemic, such as the time-dependent variation of the infectious 299 rate and the basic reproduction number. Taking a systematic approach, the proposed model calibration framework will 300 provide a quantitative understanding of the effects and efficiency of various policies designed to contain the spread of 301 13 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint 14 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint

, and produce probabilistic forecasts for the evolution of a pandemic 302 that can judiciously inform policy and decision making [39, 40] . 303 Specific to this study, the proposed approach allows us to understand the most relevant mobility and social behavior 304 parameters that affect the spread of COVID-19. Our results suggest that working from home policies have the signif-305 icant effect in containing the reproduction number. This is shown in both global and local sensitivity analyses, where 306 the mobility parameter that quantifies the amount of trips to workplaces is the most relevant among all mobility and 307 social behavior parameters. Working in confined spaces has been shown to be a risk factor for the transmission of as we would be simply guessing its size. Finally, our model tends to perform relatively better in counties with more 334 cases, as we can see in Figure 4 . This is intrinsically linked to the loss function that we chose to train the model. In 335 our case, this function tends to penalize more counties with more cases. We think this behavior is desirable because CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint Figure 11 : Methodology summary: Using the dataset D and the mobility and social behavior-based SEIR model, the multi-step LSTM parameters along with the initial fractions of the exposed population for the different counties are estimated by minimizing the loss function (7) [43] Derek K Chu, Elie A Akl, Stephanie Duda, Karla Solo, Sally Yaacoub, Holger J Schünemann, Amena El- The forward pass of the LSTM neural network considered in this work for one single evaluation ofβ i (predicted 465 value of β at time instance t i ) is given in Algorithm 1, where σ refers to the sigmoid function, • to the element wise 466 multiplication between two tensors of same shape and (·, ·) to the euclidean inner product between two vectors of 467 same size. Based on the notation used in Algorithm 1, the LSTM parameters θ N N can be expressed as follows: Figure 12 shows the convergence of the loss function of Eq (7) with the number of iterations.

Other loss functions have been investigated, including: 2. a loss based on the L 2 difference between the data of C(t) and the inferred number of cumulative cases:

3. a loss based on the L 2 difference between the increase of number of cumulative cases C(t i ) − C(t i−1 ) and

The loss defined in (7) performs better than the loss mentioned in point 1 because the latter is more prone to local 480 minima if it is not initialized sufficiently close to the exact solution and due to the noisy data and numerical errors, the 481 proposed loss (7) gave considerably better results. The losses in points 2 and 3 require one additional time integration 482 to estimate the variable R(t). Hence, such step introduces additional numerical errors, which explains the better 483 performance observed with the loss defined in (7).

18 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint A PREPRINT -SEPTEMBER 21, 2020

Algorithm 1: LSTM forward pass

m(t i−τ +k ), k = 0, . . . , τ − 1 2 Outputs:β i 3 h −1 , c −1 = 0 4 for k = 0 : τ − 1 do (7) ) 19 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. samplings as an estimator for the parameter β. The relative error averaged over the extrapolation period obtained with 500 such model is shown in the right plot of Figure 13 for the different counties. We also tried several different priors 501 for β, but all of them gave similar results. In general, the simple SEIR model tends to produce cumulative cases 502 fraction variations that are convex within the period of time considered, while the actual trajectories generally change 503 concavity, due to the enforced behavioral intervention policies, such as lock-down and gatherings ban. 504 20 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.20.20198432 doi: medRxiv preprint

Unacast social distancing dataset

Global and local mobility as a barometer for COVID-19 dynamics. 409 medRxiv

The mathematics of infectious diseases

Presumed asymp-412 tomatic carrier transmission of COVID-19

Visualizing the invisible: The effect of asymptomatic transmission on the outbreak dynamics of 415 covid-19

The reproduction number of COVID-19 and its correlation with 417 public health interventions. medRxiv

Multistep neural networks for data-driven discovery 419 of nonlinear dynamical systems

Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneu-422 monia

Early 424 release-high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2

Long short-term memory

A First Course in the Numerical Analysis of Differential Equations

Adam: A method for stochastic optimization

Curating a covid-19 data repository and forecasting county-level 433 death counts in the united states

Factorial Sampling Plans for Preliminary Copmputational Experiments

An effective screening design for sensitivity 437 analysis of large models

World Health Organization et al. Coronavirus disease 2019 (COVID-19): situation report

With COVID-19, modeling takes on life and death importance

Towards epidemic prediction: federal efforts and opportunities in 444 outbreak modeling

Infectious disease threats in the twenty-first century: strengthening the 446 global response

Modelling the global spread of diseases: A review 448 of current practice and capability

Coronavirus disease 450 outbreak in call center

Closed environments facilitate secondary transmission of 453 coronavirus disease 2019 (covid-19)