key: cord-0064125-eo63zghp authors: Guha, Paramita title: Spatiotemporal Analysis of COVID-19 Pandemic and Predictive Models based on Artificial Intelligence for different States of India date: 2021-06-09 journal: J DOI: 10.1007/s40031-021-00617-2 sha: 3fef61211f1d4654d99cd48e15419ef0d3568348 doc_id: 64125 cord_uid: eo63zghp Geographical and spatial diversities play important roles in dynamics of spread of COVID-19 virus. These phenomena are not properly addressed in the literature yet. In this paper, COVID data of various states of India are collected. The data had been processed and analysed using an open-source software. A framework based on Susceptible, Infectious, Hospitalised, Recovered and Deaths model to determine the effects of geographical diversities of Indian states on COVID-19 pandemic has been developed. The confirmed, cured and death cases due to the virus have been analysed for different state. Reasons behind the differences in number of cases in different states are identified. An improved Long-Short-Term-Memory algorithm has been developed to forecast the virus spread and recovery of patients for the next one month. Numerical results along with discussions are given. The novel coronavirus COVID-19 has created havoc all over the world. It has made an unparalleled health emergency in recent history. The COVID-19 outbreak started in Wuhan Province of China in December 2019 and has travelled all over the world with more than 2 crore people affected and around 7.5 lakhs deaths. The first confirmed case in India was reported on 30 January 2020. As of now, the number of total affected people in India is more than 22 lakhs with 45 thousand people died so far. Mathematical modelling along with temporal and spatial dynamics are important to provide the epidemiological characteristics of the disease. The transmission dynamics of infections are discussed in the literature [1] . This work is important to understand the early stages of any new infectious disease outbreak. Other than obtaining insights of epidemiological characteristics, temporal and spatial dynamics can be useful for forecasting of potential future burden of the disease along with the identification of disease hot spots. The researchers [2, 3] have used spatiotemporal analysis methods to identify the drivers of local transmission and population at risk and guide the designing of targeted interventions in resource limited settings. Temporal trajectories of the disease under populations and intervention scenarios are derived and discussed in the literature [1, [4] [5] [6] [7] . The investigators [8] [9] [10] have assessed spatiotemporal dynamics of the disease. It is observed from the literature that research on disease transmission dynamics has not been progressed much. In the literature [11] [12] [13] [14] [15] characteristics of different infectious diseases like respiratory infections with variations geographical locations, uneven distribution of population and spatial diffusions of pathogens have been discussed. The relations between the unequal distribution of capacity of healthcare systems and disease outcomes are discussed in the prior investigations [16] . It is known that regions across India have not been equally affected with this virus. The northern and western states are mostly affected, whereas the North-East states, islands are least affected. The underlying reasons of these differences are not well understood yet. Studies made in other countries show that socio-economic conditions like, higher populations; more footprints; environmental factors like air pollutions; location of major airports, highways are key factors for geographical diversities of COVID-related issues [17] [18] [19] [20] . This type of study is not much done in Indian context. Hence, in this paper, authors have developed a framework based on Susceptible, Infectious, Hospitalised, Recovered and Deaths (SIHRD) model to determine the effects of geographical diversities of Indian states on COVID-19 pandemic. An improved Long-Short-Term-Memory algorithm has been developed to forecast the virus spread and recovery of patients for the next one month. The COVID-19 data of various states of India have been collected from the website www.kaggle.com. The website was accessed for data collection on 7 August 2020. The number of confirmed, cured and deaths of different states was collected and stored in.csv file. Also, the number of hospitals, beds available in both urban and rural areas were also collected from the same website. The data were cleansed and pre-processed, and missing points were imputed using Python open-source software. After that they were arranged in ascending order so that clear status of different states can be obtained. A mathematical model to simulate the transmission dynamics of the disease has been developed. The data have been classified based on SIHRD i.e. Susceptible (S), Infected (I), Hospitalized (H), Recovered (R) and Death (D) model. The SIHRD model has been calibrated using the data obtained from the website. Details of the classifier model are given in next section. It has been observed from the literature that airports, hubs, dense populations, main roads are important geospatial attributes to spread of corona diseases, hospitalization and mortality [17, 21] . Based on this idea, in this study, the states of India are classified into two categories, viz. states with high populations, having airports more than 50 thousand passengers travelling per year and major highways passing through them and the other category of states is having less populations, not having major airports and highways. The dynamics of disease transmission and infection progression in each group has been modelled using the epidemiological compartments of SIHRD COVID-19 model. It is well known that at the end of December 2019, Chinese Province Wuhan had first reported of the outbreak of novel coronavirus . Within the span of three months this virus has travelled all over the world and affected more than 202 lakhs people with death of 7.5 lakhs so far. It has affected very badly India as well with more than 23 lakhs people affected and around 45 thousand people died. The WHO, ICMR and other government organizations of India are scrambling to minimize the spread of this deadly virus. Mathematical modelling of such infectious disease can be an essential part of this effort as a well-designed disease model can be helpful to predict the likely course of the epidemic and reveal the most promising realistic strategies for combating it. In this paper, we have considered two different approaches for modelling of COVID data. Modelled data obtained from SIHRD are passed through an artificial intelligence-based network to forecast the nature of transmission and spread of the virus. The approaches are discussed here. The most popular method to model and forecast any virus outbreak is compartmental method based on Susceptible (S)-Infectious (I)-Recovered (R) (SIR) framework. Each compartment is related to each other, and the relations are governed by nonlinear coupled equations. Here Susceptible compartment indicates group of individuals who have less or almost no immunity to the disease. They are more prone to any virus/disease. People belonging to this group can easily move into the Infection compartment if there is any contact with an infected person. It is observed from the literature that around 75% of total population of India belong to this class. The next compartment of the classifier model is Infection or infectious persons. These persons are suffering from the disease and spread to others. Individuals of this compartment can move into the Recovered stage after recovering from the illness. It is well known that nowadays COVID-19 has two categories of affected persons, viz. symptomatic and asymptomatic. Here, we have considered only symptomatic persons. Data for them are collected and analysed. The last compartment of this simple classifier is Recovered. Individuals belong to this group have recovered from this infection, and it is assumed that they have developed immunity from the exposure. Nowadays, the virologists have monitored that the persons recovered from COVID-19 virus may not have a long-lasting immunity like other diseases (SARS, CoV-2, chicken pox, etc.). However, for the sake of simplicity in this paper we have assumed that recovered persons have long-lasting immunity power. A simplified block diagram of SIR model is shown in Fig. 1 . In this model, S represents susceptibility rate, I is infections rate, and R is the recovery rate. Here, l* is the control input, in terms of individuals becoming susceptible to the disease; l is the mortality rate; k is the rate at which the susceptible individuals become infectious; and c is the rate at which the infectious individuals recover from the disease. The parameter k is proportional to transmission rate (b). The susceptibility rate can be given as with N is the total population of the city or country. The recovery rate at ' dS dt ¼ ÀkðIÞS ð1Þ After solving these coupled equations, number of persons belonging in each compartment can be obtained. The SIR model is very simple and does not take account of hospitalization and death data rates. Hence, in order to make the model more practical, in this paper two more compartments are added to the previous model to form an improved SIHRD classifier. The block diagram of SIHRD is shown in Fig. 2 . Mathematically the model for different states can be described as, Here, the hazard rate of infection for state 'i' is given by k i ; reduction of hazard rate after lockdown was over is given as e; natural recovery rate without hospitalization is denoted by 1=d. The hospitalization and discharge rates from hospitals are denoted as g and r, respectively. Similarly, mortality rates in hospitals and home quarantines are given as l and w, respectively. These are all fitted parameters and obtained through least square estimation of input data. After obtaining the modelled data, they are predicted through a Long-Short-Term-Memory (LSTM)-based recurrent neural network (RNN). The LSTM model is the most popular and common forms of RNN. It avoids longterm dependence problems and suitable for processing and predicting time series. It consists of unique set of memory cells which replace the hidden layers of RNN, and its key is the state of memory cells. It filters the information through gate structure to maintain and update the state of memory cells. LSTM consists of input, forgotten and output gates. Each memory cell consists of three sigmoid layers and one tanh layer. The structure of LSTM cell is shown in Fig. 3 . Above-discussed algorithms have been applied on the downloaded data, and relations among confirmed, cured and deaths with hospital beds availability are obtained. As discussed earlier, the data had been obtained from www.kaggle.com website. The data of different states had been processed and cleansed. For this purpose, authors have used Python open-source software. The data from each state had been modelled through SIHRD framework, and the confirmed, cured and death cases are plotted. It has been observed that Maharashtra, Tamil Nadu, Delhi, Gujarat and Karnataka are the five major states severely affected by the COVID-19, whereas the regions like Sikkim, Andaman and Nicobar, Meghalaya, Mizoram and Andhra Pradesh are least affected with this virus. It is also known that the severely affected states have airports with large number of footprints, high populations and major 4, 5, 6, 7, 8, 9. In this paper, the model has been developed using data from 26 March 2020 to 28 July 2020. Number of footprints in the most affected regions during the considered period is taken as 10 millions, whereas the least affected regions had The model estimated a total number of 22.59 lakhs (with 95% confidence level) COVID cases compared to 20.23 lakhs confirmed cases as the data collected on 7 August 2020. Similarly, the death and cured cases at 95% of confidence level are obtained as 43.671 thousand and 12.33 lakhs, respectively. These data are compared with original data and observed that the overall errors for death and cured cases are 5.23% and 3.95%, respectively. The In this paper, online COVID-19 data of India have been obtained from an open access data source. The data have been processed and cleaned using Python software. A framework based on SIHRD model has been developed to model the data obtained for different states of India. It has been observed from the analysis that regions across India have not been equally affected with this virus. The northern and western states are mostly affected, whereas the North-East states, islands are least affected. The main reasons behind this are socio-economic conditions like higher populations; more footprints; environmental factors like air pollutions; location of major airports, highways. Also, an Early dynamics of transmission and control of COVID-19: a mathematical modelling study Targeting the right interventions to the right people and places: the role of geospatial analysis in HIV program planning Know your epidemic, know your response'': a useful approach, if we get it right Abu-Raddad, Characterizing key attributes of the epidemiology of COVID-19 in China: modelbased estimations. (2020a) medRxiv Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Quantifying early COVID-19 outbreak transmission in South Africa and exploring vaccine efficacy scenarios The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study Spread and dynamics of the COVID-19 epidemic in Italy: effects of emergency containment measures Spatial-temporal distribution of COVID-19 in China and its prediction: a data-driven modeling analysis Population flow drives spatio-temporal distribution of COVID-19 in China Epidemiology of seasonal influenza in the middle east and north africa regions, 2010-2016: circulating influenza A and B viruses and spatial timing of epidemics Admission to hospital for bronchiolitis in England: trends over five decades, geographical variation and association with perinatal characteristics and subsequent asthma A systematic review of COVID-19 epidemiology based on current evidence Spatial dynamics and genetics of infectious diseases on heterogeneous landscapes Impact of pollution, climate, and sociodemographic factors on spatiotemporal dynamics of seasonal respiratory viruses Potential association between COVID-19 mortality and health-care resource availability Identification of vulnerable populations and areas at higher Risk of COVID-19 Related Mortality in the Geographic differences in covid-19 cases, deaths, and incidence-United States COVID-19 exacerbating inequalities in the US Exposure to air pollution and COVID-19 mortality in the United States Identification of vulnerable areas of high COVID-19 mortality risk in Ohio (Press release) Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions