key: cord-1015884-n11gqg64
authors: Farooq, Junaid; Bazaz, Muhammad Abid
title: A Deep Learning algorithm for modeling and forecasting of COVID-19 in five worst affected states of India
date: 2020-09-30
journal: nan
DOI: 10.1016/j.aej.2020.09.037
sha: 4dc23c523a6e3f9837a2642837781c8fb6fa255f
doc_id: 1015884
cord_uid: n11gqg64

In this paper, deep learning is employed to propose an Artificial Neural Network (ANN) based online incremental learning technique for developing an adaptive and non-intrusive analytical model of Covid-19 pandemic to analyze the temporal dynamics of the disease spread. The model is able to intelligently adapt to new ground realities in real-time eliminating the need to retrain the model from scratch every time a new data set is received from the continuously evolving training data. The model is validated with the historical data and a forecast of the disease spread for 30-days is given in the five most affected states of India.

Covid-19 is a highly contagious epidemic disease caused by novel coronavirus (SARS-CoV-2) that originated in Wuhan, Hubei Province of China in late December 2019. World Health Organization (WHO) declared Covid-19 as a pandemic on 12th March 2020 [1] . Researchers and policy makers are working round the clock to find solutions and design strategies to control the pandemic and minimize its impact on human health and economy.

The transmission of SARS-CoV-2 in humans is mostly through respiratory droplets (sneezing, coughing and while talking) and through contaminated surfaces [2] . The most significant property of SARS-CoV-2 is that it can persist on a variety of surfaces from hours to 9 days at room temperature which makes its transmission more rapid [3] . This virus can cause Acute Respiratory Distress Syndrome (ARDS) or multiple organ dysfunction, which may lead to physiological deterioration and death of an infected individual [4] .

Mathematical modeling of infectious diseases and epidemics has been employed as an important tool for analysis of disease characteristics and investigation of disease spread ever since the ground breaking work of Kermack and McKendrick in 1972 [5] . It plays a useful role in ecient decision making and optimal policy framing. Dierent models have been developed to analyse the transmission dynamics of many infectious diseases like malaria (Ronald Ross model) [6] , cholera (Capasso and Pareri-Fontana model, 1979) [7] , gonorrhea (Hethcote and Yorke model, 1984) [7] , Ebola [8] , H1N1 [9] etc. Dierent £ Corresponding author Email addresses: junaid p hd017@nitsri:net(JunaidFarooq); abid@nitsri.net (MuhammadAbidBazaz) mathematical models have been developed for Covid-19 disease as well [10] [11] .

In this work, deep learning is employed to propose an Artificial Neural Network (ANN) based online incremental learning technique to estimate parameters of a data stream guided analytical model of Covid-19 in order to aid in optimal policy formulation, ecient decision making, forecasting and simulation. Modeling and simulation of such problems poses an additional challenge of continuously evolving training data in which the model parameters change over time depending upon external factors. The main contribution of this work is that in a scenario of continuously evolving training data, unlike typical deep learning techniques, this model eliminates the need to retrain or rebuild the model from scratch every time a new training data set is received.

Covid-19 ocial data from 5 worst hit states of India has been taken as the case study. The The first case of COVID-19 in India was reported on 30th January 2020 originating from Wuhan, China [12] . As on 27 June 2020, the total number of cases reported in India is 508,953 with 295,881 recoveries and 15,685 deaths [13] . Hence the number of active cases is 197,387. An overview of the spread of COVID-19 in dierent states of India is shown in Figure 1 . The growth of Covid-19 in India is plotted in Figure 2 . It is evident from these figures that the disease has spread all over the country and the worst hit states are Maharashtra, Delhi, Tamil Nadu, Gujarat and Uttar Pradesh. India comprises of a total of 28 states and 8 union territories all of which are aected by the Covid-19 pandemic.

The government of India imposed a country wide complete lockdown on 24th March 2020 with strict restrictions on the movement of people while allowing only the essential services to operate under the supervision of administration and health May 2020 and has been relaxed since 8 June 2020 [14] .

India is the second largest populated country in the world with a total population of around 1.35 billion. The health care facilities in India are considered poor with 0.55 hospital beds per thousand people of the population [15] . Therefore, the Covid-19 pandemic has emerged as a major challenge for the people, health workers and policy makers of the country.

The paper is organised as follows: Section 2 discusses an analytical epidemiological model for Covid-19 infectious disease. Section 3 deals with the adaptive deep learning technique of model parameter estimation and updation. In section 4, simulation results for 30-day ahead forecast of disease spread in the five worst aected states of India are presented.

The pioneer work in development of mathematical models for infectious diseases was carried out by by [5] known as the susceptible-infectious-recovered (SIR) model. As one of the most classical models, it has been used by many researchers to study and analyse many infectious diseases like seasonal flu [16, 17] , pandemic flu [18, 19] , HIVGAIDS [20] , SARS [21, 22] etc. These studies have shown that SIR models are reliable for analysis of the infectious disease spread and evaluation of the impact of prevention schemes in dierent scenarios. The basic SIR model is described by the following dierential equations: The basic SIR model can be modified in various ways to accommodate dierent scenarios. A modified SIR model known as SIRD (Susceptible-Infected-Recovered-Deceased) model is of our interest here and is based on the following assumptions:

(i) This model is fatal unlike a typical non-lethal SIR model which means that there is a positive probability of an infected person dying, P(Death) a and ! 0.

(ii) A typical SIR model assumes that the Recovered group gains full immunity from reinfection. However, this model accommodates the possibility of a recovered person being reinfected with probability of reinfection, P(Rein f ection) a and ! 0. 

The next task is to to learn the model parameters which can be quite challenging in an epidemic scenario like Covid-19 as the the model parameters are supposed to change with time. This section proposes an Artificial Neural Network (ANN) based Adaptive Incremental Learning technique (ANNAIL) for online learning of the SIRVD Model parameters with the following assumptions:

(i) The rate of infection as a function of time (t) is the major challenge for parameter learning. It is aected by external factors like degree of social distancing, lockdown etc. In case of a lockdown decreases exponentially.

Therefore, in order to take into account both the lockdown and no lockdown scenarios, has been modelled as:

where t l is the time when the lockdown begins. Therefore, the learning algorithm has to learn 3 parameters

The rate of reinfection has been assumed to be zero for Covid-19 Disease. This is based on the fact that the human body produces a non-specific innate response to a viral infection initially using neutrophils, macrophages, and dendritic cells. However, this is followed by a more specific adaptive response in the form of development of proteins called immunoglobulins which act as antibodies specifically binding to the virus. This is coupled with the formation of T-cells which generate cellular immunity by identifying and eliminating the cells that are infected with the virus. Generally, sucient presence of such antibodies in collaboration with cellular immunity prevents reinfection after recovery. Although every recovered patient may not develop complete immunity, but that is the case for the most of them [23] . In case of Covid-19, although research is till going on to reach a conclusive opinion, promising studies suggest that nearly all the recovered patients develop such antiviral immunoglobulin-G (IgG) antibodies and are immune to reinfection [24] . This lays the basis for mass Serological testing for Covid-19 being practised by many governments across the world in which the blood samples of people are tested for the presence of these antibodies indicating present or past Covid-19 infection. Further, the data on reinfection, even if rare, is not available. For a typical neural network or any other technique of model parameter estimation, the training data is required first to train the model before applying it on future scenarios. However, in case of an epidemic like Covid-19, the training data is continuously evolving with time and the model needs to be trained and 

Deep learning and other machine learning techniques stand out in solving problems of data based model parameter estimation due to their state-of-the-art results. However, they face the problem of catastrophic forgetting which reduces their performance as new training data becomes available with time. This is because the typical neural networks require the entire dataset to update the model each time a new training data set becomes available as in case of an epidemic modelling problem where the training data becomes available incrementally with time. To address such issues, dierent incremental learning algorithms have been suggested in the literature [25, 26, 27] .

Incremental Learning refers to an online learning technique of continuous model adaptation under a scenarios of continuously evolving training data. Therefore, storage or access to the previously observed data is not required each time a new data set is received, as in case of an epidemic like Covid-19. In order to adapt the model parameters in light of new data, it is not needed to use all the previously accumulated data for developing the model from scratch. Rather, the Learning Network modifies the previous hypothesis to adapt to the new data chunk.

In this paper, hypothesis generation via an Artificial Neural Network (ANN) is proposed. Let D j 1 be the data set received between time t j 1 and t j , and h j 1 be the hypothesis generated on this data set. The hypothesis h j for a new data set D j received between time t j and t j+1 is a function of D j and h j 1 only as under:

The experience gained from this step is stored and integrated to support in future adaptation process. Thus the objective here is to integrate the previously learned knowledge into the new raw data set to adapt the model parameters accordingly; and to accumulate this experience over time to increase the model eciency, accuracy and flexibility.

The proposed framework for the above problem is shown in Figure 4 . The ANN is based on a non-linear activation function for successful regression analysis. The hidden layers are represented by the function f NN . With the continuous data stream, the weight distribution functions are generated to describe the learning capability of the ANN where the decision boundary is adjusted to focus especially on the hard to learn data examples. The algorithm for this framework is given in Algorithm 1.

This algorithm is run in a top-down and horizontal signal flow, as shown in Figure 4 . The adaptive nature of this algorithm is due to the mapping function based on ANN which estimates the initial distribution functionˆt 1 for D t while providing a quantitative approach to indicate the learning power of the new data set based on previously trained model.ˆt 1 is applied to the new data set to find pseudo-error, §. Thus a hard to learn example will have higher § in step (iii) of the Learning procedure, and will in turn receive higher weight in step (iv). This ensures the adaptive nature of the algorithm. (2) Learning Procedure

where f NN is the ANN defined mapping function. (ii) Apply hypothesis h t 1 to D t , and find the pseudo-error

(iv) Update the ANN's mapping function for D t :

where N t is a normalization constant. (v) Develop new hypothesis h t from D t and¨t: using h j a f NN (h j 1 ; D j ) (vi) Repeat the above procedure for D t+1 .

The final hypothesis is:

where T is the set of incrementally developed hypotheses in the learning life. 

Mapping function connects the past experiences to the new data in an adaptive fashion. There can be many ways to design the mapping function. However, we implemented the nonlinear regression by an ANN based approximation of mapping function owing to its flexibility. Any such Neural Network based function approximation technique can be used. As in illustration we take the Multilayer Perceptron (MLP) in this paper. This is shown in Figure 5 .

The input is an n-dimensional vector (for example, Number of infections, deaths, recoveries in an epidemic) of example i. The distribution function is currently estimated as J t 1 , W represents the weights of a layer. Backpropagation is used to tune the weights W of dierent layers, where error function is defined as e(k) a J t 1 (k 1) ¨t 1 (k 1)

where k is the training epoch of the backpropagation. The neural network gives the following output:

1 C e h f (k) ; f a 1; :::

h f (k) a n X q=1 w (1) f;q (k)x i;q (k); f a 1; :::

where h f represents the input to fth hidden node while as g f represents its output, is the input to the final node, N h is number of hidden neurons, and n is the total number of inputs. Weights on the ANN are updated by applying the above defined backpropagation strategy as explained below. 

a e(k):

Similarly, weight adjustments for the input to hidden layer is described as:

¡w (1) ( f;q) a (k) 2 6 6 6 6 6 6 4 @E(k) @w (1) f;q (k) 

@w (1) f;q (k) a @E(k) @J(k)

a e(k):

where (k) describes the learning rate. Estimation of initial distribution functionˆt for D t required only the feedforward path of the MLP.

Here, the model is validated for Covid-19 in the whole country of India, where the first 80% of data was used for training and the remaining 20 % was used for testing as shown in Figure  6 .

It is clearly evident from the plots shown in the figure that the results given by the model during testing are very close to the actual data. The inputs and outputs in this algorithm were: Inputs : Number of New infections, deaths and recoveries; rate of vaccination ().

Outputs : 0 ; 1 ; ; ;

These outputs are fed to the analytical SIRVD model at every time instant when a new set of input data is received to simulate and forecast dierent scenarios. Model Validation for each of the 5 worst hit states is separately presented in the Results and Discussion section.

For forecast error calculation, percentage mean absolute error (MAE) has been defined as:

where y represents the observed values andŷ is the predicted value for n number of observations. Figures 7-11 show the forecast of Covid-19 growth in the worst hit states of India for next 30 days. These results are summed up in Table 1 .

Maharashtra, with the highest number of recorded cases, accounts for 30% of the total country's caseload. Mumbai, the capital of Maharashtra and dubbed as the financial capital of India, has witnessed the steepest rise in the growth of Covid-19 and currently forms the Covid-19 epicentre of India. This is particularly due to high population density, tourist influx and dependence on public transport. Dharavi, a locality in Mumbai, is of particular concern as it is considered to be one of Asia's largest slums with an area of just over 2.1 square kilometres and a population of about 1,000,000. Therefore, the rules of social distancing are dicult to observe there. Further, Mumbai has shortage of ICU beds and dedicated COVID-19 hospitals as well. As shown in Figure 7 , Maharashtra is expected to see a continuous upsurge of cases and deaths in coming weeks. MAE describing the forecast error for Maharashtra for past 30 days is as low as 1.86% which can be attributed to the extensive testing and reporting.

Delhi, the capital of India, stands next to Maharashtra in facing the brunt of Covid-19 pandemic. Our model records an MAE of 19.79% for Delhi over past 30 days. The apparent reasons for this relatively large error is low testing rate and higher mobility of infected people. The government of India allowed migrant workers and people from other states stranded in Delhi to move to their home states after 2nd phase of the lockdown which disturbed the model assumption that there is zero inflow and outflow of population. Further, missed reports of deaths were recently added to the ocial data by the Ministry of Health which increased the forecast error. As shown in Figure 8 , Delhi will continue seeing moderate rate of increase in the number of cases in coming weeks, partly due to now updated quarantine facilities and health care.

The scenario for Tamil Nadu as shown in Figure 9 is similar to that of Delhi except that the forecast error is lower owing to better testing and reporting in the state. Further, the mortality rate in Tamil Nadu is low owing to its relatively better health infrastructure. Chennai, the capital of Tamil Nadu, is known for medical tourism in India. However, Tamil Nadu state is expected to see a steep rise in the number of cases and deaths for next one month and hence the administration should frame the policies accordingly.

The case of Gujarat as shown in Figure 10 is alarming as the mortality rate is 5.93% which is almost double the national average. This can be ascribed to the delay in testing and relatively low quality of public healthcare. The total number of cases is expected to rise upto 48 thousand within next one month. Although the number of cases in Uttar Pradesh, as shown in Figure 11 , is lower considering the fact that it is the largest populated state in India with a population of around 204 million, the Covid-19 cases in Uttar Pradesh are expected to grow more than double within next one month.

This work studies the spread of Covid-19 disease in India's five worst hit states. An analytical epidemiological model was developed for Covid-19 pandemic where model parameters are continuously updated to intelligently adapt to new data sets using an ANN based adaptive online Incremental Learning technique. In a scenario of continuously evolving training data, unlike typical Deep Learning techniques, the model eliminates the need to retrain or rebuild the model from scratch every time a new training data set is received. India was taken as a case study. However, this model can be applied to any population in the world and would be a useful tool for policy makers, health ocials and researchers in improving decision making eciency, policy formulation, monitoring and forecasting of an epidemic. The model used here is non-intrusive, adaptive, intelligent, real-time and online in nature, therefore it is immune to loss of accuracy, reliability or computational performance arising due to limitations like run-time duration, size of training data, computational complexity, change in transmission dynamics due to mutations in virus or bacteria, change in prevention mechanisms or government policies. Even if the epidemic continues for decades in the whole world, the model will keep working eciently on daily basis without any decay in performance or RTE (run-time environment). This work is a demonstration of the applicability and usefulness of the incremental learning technique, as an adaptive deep learning strategy, in data science problems where training data is not available beforehand but evolves continuously with time. Modeling and Simulation work was carried in Matlab.

The simulation results suggest an alarming rise in the number of cases in coming weeks with flattening of infection curves still far from sight. This should prompt the healthcare ocials and policy makers to take appropriate decisions to combat the rise of disease spread. It is pertinent to mention here that the estimates provided herein are as good as the data provided by the ocial sources. Under-testing and under-reporting of disease can subject the forecasts to unavoidable errors. However, the information provided by this paper can provide significant insights into the pattern of disease spread and help in making a realistic assessment of the situation necessary for optimal policy framing.

The doctoral research funding from Ministry of Human Resource Development, Government of India, in favor of the first author is duly acknowledged.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Manuscript title: A Deep Learning algorithm for modeling and forecasting of COVID-19 in five worst affected states of India.

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. 

detail/who-director-general-sopening-remarks-at-the-mission-briefing-on-covid

Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1

COVID-19 ARDS: clinical features and dierences to usual pre-COVID ARDS

Contributions to the mathematical theory of epidemics

Mathematical Biology: 1. An Introduction

Modeling influenza epidemics and pandemics: insights into the future of swine flu (H1N1)

Three months of covid-19: A systematic review and meta-analysis

Evaluation and prediction of covid-19 in india: A case study of worst hit states

Covid-19

Covid-19 pandemic lockdown in india

Transmission of influenza: implications for control in health care settings

A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal data

A small-world-like model for comparing interventions aimed at preventing and controlling influenza pandemics

Transmissibility of 1918 pandemic influenza

Analysis of recruitment and industrial human resources management for optimal productivity in the presence of the HIV/AIDS epidemic

Murray, Transmission dynamics and control of severe acute respiratory syndrome

Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health intervention

Antibody responses to SARS-CoV-2 in patients with COVID-19

Overview of Some Incremental Learning Algorithms

End-to-End Incremental Learning

Incremental learning algorithms and applications