key: cord-0682518-bg85qud7 authors: Li, Z.; Zheng, Y.; Xin, J.; Zhou, G. title: A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19 date: 2020-07-22 journal: nan DOI: 10.1101/2020.07.20.20158568 sha: 9dd32c7e02ae2fad54025db16b54cf7c1a3c3707 doc_id: 682518 cord_uid: bg85qud7 The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data. Susceptible-Infected-Removed (SIR) is a classical differential equation model of infectious diseases [2] . It divides the total population into three compartments and models their evolution by the system of equations dS dt = −β I S dI dt = β I S − γ I dR dt = γ I where β and γ are two positive parameters. SIR is a simple and efficient model of temporal data for a given region, see also [3] for related compartment models with social structures. Yet the infectious disease data are often spatio-temporal as in the case of COVID-19, see [5] . A natural question is how to extend SIR to a space time model of suitable complexity so that it can be quickly trained from the available public data sets and applied in real-time forecasts. See [8] for temporal model real-time forecasts on cumulative cases of China in Feb 2020. In this paper, we explore spatial infectious disease information to model the latent effect due to the in-flow of the infected people from the geographical neighbors. The in-flow data is not observed. To this end, machine learning tools such as regression and neural network models are more convenient. Auto-regressive model (AR) and its variants are linear statistical models to forecast time-series data. The Long Short Term Memory (LSTM) neural networks, originally designed for natural language processing [4] , have more representation power and can be applied to disease time-series data as well. With spatial structures added, the graph-structured LSTM models can achieve state-of-the-art performance on spatiotemporal influenza data [6] , crime and traffic data [10, 9] . However, they require a large enough supply of training data. For COVID-19, we only have limited daily data since the outbreaks began in early 2020. Applying space-time LSTM models [6, 9] directly to COVID-19 turns out to produce poor results. In view of the limited COVID-19 data, we shall propose a hybrid SIR-LSTM model. In [11] , the authors designed a variant of AR, the AutoRegression with Google search data (ARGO), that utilizes external feature of google search data to forecast influenza data from Centers for Disease Control and Prevention (CDC). Based on google search trend data that correlated to influenza as external feature, ARGO is a linear model that processes historical observations and external features. The prediction of influenza activity level at time t, defined asŷ t , is given by:ŷ ARGO is optimized as: where α = (α 1 , · · · , α 52 ) and β = (β 1 , · · · , β 100 ). The y t−j 's, 1 ≤ j ≤ 52, are historical observations of previous 52 weeks and X i,t are the google search trend measures of top 100 terms that are most correlated to influenza at time t. Essentially, ARGO is a linear regression with regularization terms. In [11] , ARGO is shown to outperform standard machine learning models such as LSTM, AR, and ARIMA. In [6] , graph structured recurrent neural network (GSRNN) further improved ARGO in the forecasting accuracy of CDC influenza activity level. The CDC partitions the US into 10 Health and Human Services (HHS) regions for reporting. GSRNN treats the 10 regions as a graph with nodes v 1 , · · · , v 10 , and E be the collection of edges (i.e E = {(v i , v j )|v i , v j are adjacent}). Based on the average history of activity levels, the 10 HHS regions are divided into two groups, relatively active group, H, and relatively inactive group, L. There are 3 types of edges, L − L, H − L, and H − H, and each edge type has a corresponding RNN to train the edge features. There are also two node RNNs for each group to output the final prediction. Given a node (region) v, suppose v ∈ H. GSRNN generates the edge features of v at time t, e t v,H and e t v,L , by averaging the history of neighbors of v in the corresponding groups. Next, the edge features are fed into the corresponding edge RNNs: Then, the outputs of edgeRNNs are fed into the nodeRNN of group H together with the node feature of v at time t, denoted as v t , to output the prediction of the activity level of node v at time t + 1, or y t+1 v : We propose a novel spatiotemporal model integrating LSTM [4] with a discrete time I-equation derived from SIR differential equations. The LSTM is utilized to model latent spatial information. The Iequation models the observed temporal information. Our model, named IeRNN, differs from [6, 10, 9] in that a difference equation with 3 parameters (the I-equation) fits the limited temporal data, which is far more compact than LSTM. Based on SIR model, we add an additional feature I e that represents the external infection influence from the neighbors of a region. Then the SIR nonlinear system with I e as external forcing becomes which conserves the total mass (normalized to 1): S + I + R = 1. It follows from (3) that Hence, Substituting S(t) into (2) we have: Combining forward Euler method and Riemann sum approximation of the integral, we have a discrete approximation: . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint As we model I(t) from the beginning of the infection, we have t 0 = 0 and R(t 0 ) = 0. We arrive at the following discrete time I-equation: Note that if we let I e (t) = 0, then we have an approximation of I(t) for the original SIR model, which is a solely temporal model (named I model): In reality, it is hard to know how a population of a region interacts with populations of neighboring regions. As a result, I e (t) is a latent information that is difficult to model by a mathematical formula or equation. In order to retrieve latent spatial information, we employ recurrent neural networks made of LSTM cells [4] , see Fig. 2 . We utilize the spatial information basing on the Italy region map, Fig. 1 . In order to learn the latent information I e of a region v, we first generate the edge feature of v. Let C be the collection of neighbors of v. Then, the edge feature of v at time t is formulated as: where I i (t) is the infection population percentage of region v i at time t. Then, we feed f t e into an Edge RNN, an RNN with 3 stacked LSTM cells (see Fig. 3 ), followed by a dense layer for computing I e . The activation function of the dense layer is hyperbolic tangent function. Figure 4 illustrates the procedure of computing I e for Lazio as an illustration. We hence call our model IeRNN due to its integrated design of I-equation and edge RNN. To calibrate our model IeRNN, we use the Italy COVID-19 data [5] for training and testing. Although the US has the most infected cases, the recovered cases are largely missing. On the other hand, the Italian COVID-19 data is more accurately reported and better . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint maintained, reflecting a nearly complete duration of the rise and fall of infection. We collect the data of daily new (current) cases from 2020-02-24 to 2020-06-18 of 20 Italian regions. We set p = 3 in (4) based on experimental performance. As a result, we have the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint current data for 113 days, with 81 days to train our model and 32 days to test our model (or 70%/30% training/testing data split). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint Our training loss function is the mean squared error of the model output and training data: Since the training is minimization of the above loss fucntion over parameters in both I-equation and RNN, the two components of IeRNN are coupled while learning from data. We use Adam gradient descent to learn the weights of LSTM and the dense layer, as well as I-equation parameters β 1 , β 2 , and γ. To evaluate the performance of our model, we compare IeRNN, I-model (5), a fully-connected neural network (fcNN, Fig. 5 ) with hyperbolic tangent activation function, and auto-regression model (ARIMA). As the standard setting of ARIMA is 1-day ahead prediction, we shall only compare with it in such a very short-term case. Since infectious disease evolution is intrinsically nonlinear, we shall compare nonlinear models for 3-day and 1-week ahead forecasting. Based on experimental performance, we set the number of hidden units to be 100, 150, and 100 for the three layers of fcNN respectively. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint As we see in Fig. 6 , fcNN can perform poorly. This is not a surprise, as both [6] and [11] relied on hundreds of historical observations to train their models. The I-model based on only sequential data in time of one region merely follows the trend of the true data but cannot provide accurate predictions. Our IeRNN model, with the help of additional spatial information, is able to make accurate predictions and outperform other models. We also test the IeRNN with training data reduced to 40% (46 days). IeRNN is still able to track the general trend of the infected population percentage. We measure the test accuracy with the Root Mean Square Error (RMSE) averaged over a few trials in training. In Tables 1 and 2 on 1-day ahead forecast, IeRNN achieves the smallest RMSE errors, and I-model has the largest errors. The compact I-model with 2 parameters cannot do 1-day ahead prediction as accurately. ARIMA outperforms I-model and does better on Emilia-Romagna and Lazio regions than fcNN. ARIMA, a linear model, has simpler structure than fcNN whose nonlinearity does not play out in such a short time task. Fig. 8 shows 1-day ahead forecast of IeRNN model on other regions with the learned latent external forcing I e in Fig. 9 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint Model tr70 inf70 tr40 inf40 IeRNN 0.58s 0.018s 0.51s 0.02s I-model 0.14s 0.004s 0.11s 0.004s fcNN 0.09s 0.003s 0.09s 0.003s ARIMA 0.23s 0.014s 0.19s 0.015s . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint In model training for multi-day ahead forecast, the training loss function is modified so that the model input comes from multiple days in the past. In 7-day ahead forecast, IeRNN leads the other two nonlinear models especially in the 40% training data case, by as much as a factor of 7 in Lombardy. In the 3-day ahead forecast, IeRNN leads fcNN by a factor of 4 in the 40% training data case, as much as a factor of 10 in Lazio. Figs. 10-13 show model comparison in training and forecast phases for Lombardy and Lazio. IeRNN (fcNN) has about 16400 (1800) parameters. The optimized (β 1 , β 2 , γ) = (0.685, 0.158, 0.044) in Lombardy, similarly in other regions. Table 7 lists average model training and inference times. We developed a novel spatiotemporal infectious disease model consisting of a discrete epidemic equation for the region of interest and RNNs for interactions with nearest geographic regions. Our model can be trained under 1 second. Its inference takes a fraction of a second, suitable for real-time applications. Our model out-performs temporal models in one-day and multi-day ahead forecasts in limited training data regime. In future work, we shall consider social and control mechanisms [1, 7] to strengthen the I-equation, as well as traffic data to expand interaction beyond nearest neighbors. The work was partially supported by NSF grants IIS-1632935, and DMS-1924548. JX would like to thank Prof. Fred Wan for helpful communications on disease modeling. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint Figure 11 : Training and 7-day ahead forecast of 3 models (IeRNN, fcNN, and Imodel) with reduced (40%) training data in 3 rows respectively. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2020. . https://doi.org/10.1101/2020.07.20.20158568 doi: medRxiv preprint Control with uncertain data of socially structured compartmental epidemic models Infectious Diseases of Humans: Dynamics and Control The mathematics of infectious diseases Long short-term memory Italian region. Italian COVID-19 Data A study on graphstructured recurrent neural networks and sparsification with application to epidemic forecasting Optimal, near-optimal, and robust epidemic control Real-time forecasts of the COVID-19 epidemic in China from Graph-based deep modeling and real time forecasting of sparse spatiotemporal data Deep learning for real-time crime forecasting and its ternarization Accurate estimation of influenza epidemics using Google search data via ARGO