key: cord-0112518-hk5hn6h5
authors: Long, Jie; Khaliq, Abdul; Furati, Khaled
title: Identification and prediction of time-varying parameters of COVID-19 model: a data-driven deep learning approach
date: 2021-03-17
journal: nan
DOI: nan
sha: 387f4a128f84ddd9ce7fa530711dd7918b037978
doc_id: 112518
cord_uid: hk5hn6h5

Data-driven deep learning provides efficient algorithms for parameter identification of epidemiology models. Unlike the constant parameters, the complexity of identifying time-varying parameters is largely increased. In this paper, a variant of physics-informed neural network (PINN) is adopted to identify the time-varying parameters of the Susceptible-Infectious-Recovered-Deceased model for the spread of COVID-19 by fitting daily reported cases. The learned parameters are verified by utilizing an ordinary differential equation solver to compute the corresponding solutions of this compartmental model. The effective reproduction number based on these parameters is calculated. Long Short-Term Memory (LSTM) neural network is employed to predict the future weekly time-varying parameters. The numerical simulations demonstrate that PINN combined with LSTM yields accurate and effective results.

The novel SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) has spread all over the world since its discovery at the end of 2019, with millions of confirmed cases and more than a million deaths [29] . On March 11, 2020, COVID-19 was characterized as a pandemic by the World Health Organization (WHO) [28] . Due to the huge negative effects of COVID-19, precautionary measures have been aggressively carried out worldwide, such as facial masking, contact tracing, social distancing, and some governmental actions such as lockdowns. Hence, it is significant to analyze the dynamics of COVID-19 so that the effectiveness of those implemented measures can be verified.

Epidemiological models provide an efficient tool for determining and explaining the dynamics of disease transmission. In the early stage, one of the classical models is the Susceptible-Infectious-Recovery (SIR) model presented by Kermack and McKendrick in 1927 [13] . This compartmental model computes the theoretical number of suscep-tification of these weekly time-series parameters is validated, LSTM is implemented to predict future parameters. In this way, by examining weekly values instead of daily ones, LSTM can still accurately predict future parameters. Another reason to predict the parameters rather than forecast the infectious cases directly is that we could obtain corresponding reproduction number R 0 . It is an essential threshold to reflect the disease transmission speed and verify the effectiveness of these measures in curbing the outbreak of COVID-19. In Figure 1 , we describe that PINN approximates the parameters of the SIRD model by training collected data. Then, LSTM is proposed to predict the future value of parameters. After that, obtained parameters are substituted into the SIRD model so that we could predict the infectious cases.

Outline of the paper: In Section 2 we introduce the SIRD model and explain model parameters (β, γ, and µ). Following that, we introduce the effective reproduction number and how to compute it. Deep learning neural networks, including feedforward neural network (FNN), PINN, and LSTM, are described briefly in Section 3. In Section 4, the collected data is described, and the loss function of PINN is formulated. After that, we present LSTM neural networks for the future prediction of parameters. The simulation results are discussed in Section 5.

In this section, we introduce the SIRD model employed in our study. One promise of the SIRD model is that natural birth and death rates are neglected or equivalent so that the total population is considered as constant. Then, the population could be divided into four mutually exclusive groups, which are susceptible, infected, recovered, and deceased, respectively. Also, the reproduction number is introduced in this section.

We consider the SIRD model described by the following system of ordinary differential equations [5] :

where S(t), I(t), R(t), and D(t) are the numbers of susceptible, infected, recovered, and deceased individuals, respectively, and N is the total population, N = S(t) + I(t) + R(t) + D(t). Without loss of generality, N is a large number and difficult to compute in some optimization problems. Thus, we used the fraction of the population in each, which means that S(t), I(t), R(t), and D(t) are divided by N . In this way, our calculation is simpler, and we can still adhere to the same dynamic system of the epidemic. The parameter β(t) represents the number of contacts each day for infected individuals. Besides, there are two essential assumptions. First, contacts between the infected and uninfected people are sufficient to spread the disease. Second, the population is mixed homogeneously. Thereby, β(t)S(t) susceptible people are infected by a patient each day on average, and β(t)S(t)I(t) is the number of newly infected people, γ(t) is the rate at which infected cases become recovered. Then, there are γI(t) infected individuals turned into the third compartment. Similarly, µ(t) is the rate of mortality by disease, and µ(t)I(t) infected individuals are deceased.

There is an essential threshold used to get a better understanding of the transmission rate. This threshold is characterized by the reproduction number, which is an estimate of newly infected cases caused by an infected individual [25] . Assuming there is just one person infected in the beginning, then in the SIRD model, I 0 = 1, S 0 = N − 1, and no recovery at this moment. When we utilized the fraction for each compartment, I 0 ≈ 0 and S 0 ≈ 1. By the second equation of eq. (1), we have:

where β(t), γ(t), and µ(t) are greater than 0. In eq. (2), it is simple to obtainİ > 0 when β(t)S(t) > (γ(t) + µ(t)), which means the number of infected cases keeps increasing.

A key idea about the reproduction number R 0 is that it is the gauge of the secondary infections at the beginning of this disease invasion [9] . After the disease invasion, the contact rate β decreases because of the reduction of this susceptible group. Those who are infected or have already recovered cannot be infected once again. Thereby, another similar threshold, which is the effective reproduction number R e [22] , is used to indicate whether the disease keeps spreading, and R 0 is the upper bound of R e [12] .

In this section, we provide a brief description of the deep learning neural network architectures utilized in our algorithm for parameter identification and disease dynamics prediction.

The feedforward neural network (FNN) is an artificial neural network with a simple architecture in which the connections between the nodes do not form a cycle. FNN is utilized to approximate the target function by applying several activation functions to input recursively and then output a value or vector that is close to the target values. The input layer consists of input neurons, bringing the training data into the network for further processing by hidden layers. Layers between the input layer and output layer are hidden layers, which perform linear or nonlinear transformations of training data to produce an intended output. These transformations are implemented by activation functions. The first hidden layer takes the weighted sums of input through an activation function to produce output for each neuron. Then, these outputs plus a constant, which is bias in the first hidden layer, are viewed as an input of the second hidden layer. After that, as the last layer of the network, the output layer presents desired results [6] .

The forward propagation from one layer to the next layer's nodes is given as follows [7] .

where n l represents the number of neurons in the l th layer, z l i , i = 1, · · · , n l , denotes the output of the i th node in (l − 1) th layer, f represents the activation function, w l ij are the weights between node i th and node j th , b l are the bias in the l th layer. The activation functions employed in this study are the hyperbolic tangent function

and sigmoid function

When output values are obtained from the output layer, they are used to build a loss function, which is used to estimate the error of the model. An optimizer, like Adam [14] or gradient descent method [1] , is employed to minimize the loss function so that the output is close to the observations. According to different problems, the selections of optimizer and loss function are different.

The physics-inform neural network (PINN) is a data-driven algorithm to approximate the solution of differential equations and identify parameters [20] . It could utilize any type of neural network architectures, like the FNN and Convolutional neural networks (CNN), as the main framework. The applied activation functions and optimization methods in PINN are the same as the usual deep learning techniques. The fascinating part of this algorithm is the loss function, which is generally comprised of boundary conditions, initial values, and physical constraints.

The outputs of the neural network are constrained to satisfy the system of differential equations by penalizing the residuals of differential equations into the loss function. In these residual equations, the derivatives of outputs with respect to time are computed by a black-box, which is automatic differentiation [17] . Automatic differentiation is employed to train these models and allows PINN to take derivatives with respect to input coordinates so that it could discover the latent physical laws. More details could be found in [20] .

As an advanced tool to process and predict time-series, LSTM is a special type of RNN. The difference between RNN and feedforward neural network is that it is capable of storing past information by taking previous layers' output as an input for the next input. Also, all layers of RNN share the same parameters because they perform the same tasks for the elements of the sequence, which is a so-called recurrent. Based on this feature, RNN could predict sequential information. However, on account of only involving the output of the last layer, it cannot work out long-term memory problems [10] . Thus, LSTMs are created for solving them. LSTMs as shown in Figure 2 are chains of memory cells containing three gates, which are input gate, forget gate, and output gate as explained in [4] . The key part of these memory cells is the cell state that could maintain the global information of sequence in each time-step. The content of the cell state would be modified at different time-steps through the forget gates and input gates. More specifically, forget gates decide what information would be thrown away by utilizing the sigmoid function. Instead, input gates decide what new information would be stored by combining sigmoid eq. (6) and tanh eq. (5) functions. Then, the cell state is updated and moves forward to the next state by implementing some operations. After that, the output is calculated in the output gate. By analyzing the structure of a memory cell, we could see that LSTMs are capable of storing long-term memory by putting new information into cell states.

In this section, we describe the collected data and introduce our algorithms for parameter identification and making predictions. In our work, the parameters of the SIRD model are viewed as daily-varying and weekly-varying parameters to simulate the complex real situation of COVID-19. Based on this perspective, two algorithms are employed to identify daily and weekly time-series parameters. Having the identified parameters by PINN as an input, we introduce the LSTM algorithm for making predictions.

The data considered in our study is downloaded from https://covidtracking.com/ data/download for New York, New Mexico, and Texas states. It is comprised of commutative infected, recovered, and deceased cases for the period from March 30 to September 30. A plot of the data is shown in Figure 3 . As can be observed, the three states have different dynamics between the compartments. In New York, while the number of cases in I, R, and D compartments is increasing, the number of recovered and deceased cases is relatively small compared to infections. New Mexico data shows that the infected and recovered cases kept almost the same increasing rate before August but then the infections started decreasing slowly, while the number of recovered individuals kept increasing at a high rate. As for Texas, the infected population started growing fast from the second half of June, and then went down after about 30 days.

In this subsection, we describe how to identify parameters of the SIRD model by utilizing PINN.

The utilized PINN for learning the parameters is composed of an FNN architecture. This architecture consists of 4 hidden layers and 60 neurons in each layer. The activation functions of hidden layers and the output layer are the hyperbolic tangent function, and sigmoid function, respectively. Figure 4 shows the basic framework, which is FNN, and the loss function of PINN in our study. The loss function of PINN has two parts, namely GE loss and OB loss in this figure. The GE loss represents the residuals that are obtained from the SIRD model eq. (1) by subtracting the right side from the left side:

The OB loss is the mean squared error between the outputs of the neural network and data. We employ Adam algorithm, which is the first-order gradient-based optimization of stochastic objective functions [14] , to update the parameters of the neural network by minimizing the loss function.

Algorithm 1 describes how to use PINN to identify the daily time-varying parameters. The input is time t, and outputs include the three parameters and the four compartments of the SIRD model. The weights w and bias b are initialized randomly. 

OB loss denotes the difference between the output of the neural network and observation data

GE loss represents the SIRD model residual loss, then we have the total loss loss = OB loss + GE loss

Using the Adam optimizer to update the weights and bias by minimizing the loss function end

The approach of identifying the weekly-varying parameters is described in algorithm 2. We divide the utilized data of algorithm 1 into weekly intervals, and then they are trained to identify corresponding parameters. Cubic spline interpolation is employed to obtain adequate training set for the neural network training. Build the loss function components.

OB loss denotes the difference between the output of the neural network and observation data

GE loss represents the SIRD model residual loss loss = OB loss + GE loss Using the Adam optimizer to update the weights and bias by minimizing the loss function end save the value of β γ and µ end

Having the learned parameters by PINN, the LSTM is employed to predict the future parameters by applying Keras as depicted in Figure 5 . At first, we normalize these parameters and create new datasets with multiple inputs and one output. Inputs are previous three-time steps parameters, and output, which is the prediction of these inputs, is the next one-time step parameter. Then, the data is used to train the LSTM neural network. We utilize three hidden layers of 80 nodes each to predict the parameters of the next four weeks. 

In this section, we present and discuss our simulation results. The accuracy of the learned parameters by PINN is validated using the relative error in the numerical solution of System (1), obtained using the Fourth Order Runge-Kutta (RK4) method, with respect to the exact solution. Furthermore, the relative error in the learned solution by PINN with respect to the data is examined. We start by presenting the results for the learned daily and weekly time-varying parameters followed by the associated reproduction numbers. Then, predictions of infectious cases are provided.

In general, the daily-varying values of the parameters are more reflective of the daily data than the single constant average values of these parameters. However, identifying the daily-varying parameters is traditionally costly and adds to the complexity of the problem. Fortunately, PINN provides an efficient approach to overcome this difficulty. On the other hand, our simulations show that the learned weekly-varying data produce precisely identified parameters and more stable approximation results.

Algorithms 1 and 2 give the identified parameters and the learned values of the SIRD model by PINN. We use the RK4 method to solve the SIRD model with learned parameters. Then, the solutions of the ODE system (1) are compared with the learned values.

In Figures 6, 7 , and 8, we present the learned values by PINN (NN), the solution of the ODE system (1) (ode), and the reported data (data). Moreover, the relative errors with respect to the reported data are also computed. There are some large relative errors at the early stage of the outbreak of the disease. This could be due to the small size of the recovered and deceased populations. But the relative errors of the solution of the SIRD model are in a reasonable range, which means that we obtained ideal identified parameters. For the weekly-varying parameters, the period is divided into weeks and the learned results are presented in Figures 9, 10 , and 11. The relative errors in Figures 9, 10 , and 11 are smaller compared to the results in Figures 6, 7, and 8 . Thus, we could infer that the approximation of constant parameters is more accurate and stable.

By eq. (3), the effective reproduction number can be obtained using the learned daily and weekly varying parameters. Figure 12 shows the effective reproduction number in the three states. We could see the daily and weekly time-varying reproduction numbers have the similar trends, which proves the feasibility and effectiveness of our methods to some extent. In the result of New York, we could see the R e is greater than 1 in the whole period, which represents that the number of infectious cases keeps increasing. As for the situation in New Mexico, R e decreases to 1 from August, then the population of infected individuals decreased at that period. By observing the change of R e in Texas, the value is too large in the beginning because the parameter identification in that period is not ideal. But after that period, the R e goes back to normal and oscillates between 0 and 4. Besides, we could see that the growth rate of infected individuals in New York is faster than the other two states by Figure 3 . 

The LSTM is employed to predict the future values of the parameters. Figure 13 presents the predictions of infected cases for the next four weeks. The prediction of New York is close to the original trend of infectious cases. Predictions in New Mexico show general accord with real situations, but it cannot capture the small fluctuations. As for Texas, there is a sharp increase and decrease in the infectious cases at the end of September, and our prediction could not detect this situation. 

We introduced a data-driven deep learning approach based on physics informed neural network to solve the epidemiological models and identify weekly and daily time-varying parameters. The PINN was used for parameters identification and solving the ODE system. LSTM was implemented to predict the weekly time-varying parameters and then substituted predicted parameters into the SIRD model so that we could obtain the future trend of COVID-19. The results and errors have shown that the weekly time-varying parameters provided a good fit to the real data. The algorithms could be further developed to achieve a more accurate approximation for complex problems. We intend to explore other architectures of PINN in future work.

Backpropagation and stochastic gradient descent method

A simulation of a COVID-19 epidemic based on a deterministic SEIR model

A Time-dependent SIR model for COVID-19 with undetectable infected persons

Comparison of Long Short Term Memory Networks and the Hydrological Model in Runoff Simulation

Modelling provincial COVID-19 epidemic data in Italy using an adjusted timedependent SIRD model

Deep Learning

A training algorithm for binary feedforward neural networks

SARS epidemical forecast research in mathematical model

Public health interventions and epidemic intensity during the 1918 influenza pandemic

Long short-term memory

Analysis of COVID-19 spread in South Korea using the SIR model with time-dependent parameters and deep learning

Meso-scale modeling of COVID-19 spatiotemporal outbreak dynamics in Germany, medRxiv

A contribution to the mathematical theory of epidemics

Adam: A method for stochastic optimization

A conceptual model for the outbreak of Coronavirus disease 2019 (COVID-19) in Wuhan, China with individual reaction and governmental action

Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data

DeepXDE: A deep learning library for solving differential equations

First-principles machine learning modelling of COVID-19

Design of a nonlinear model for the propagation of COVID-19 and its efficient nonstandard computational implementation

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

On parameter estimation approaches for predicting disease transmission through optimization, deep learning and statistical inference methods

Unraveling R 0 : Considerations for Public Health Applications

Predictions for COVID-19 with deep learning models of LSTM

Stochastic dynamic model of SARS spreading

Reproduction numbers of infectious disease models

Phase-adjusted estimation of the number of coronavirus disease

Spatial dynamics of an epidemic of severe acute respiratory syndrome in an urban area

Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions

Deep learning methods for forecasting covid-19 time-series data: A comparative study