key: cord-0652267-wdk9j8ex authors: Verma, Hanuman; Mandal, Saurav; Gupta, Akshansh title: Temporal Deep Learning Architecture for Prediction of COVID-19 Cases in India date: 2021-08-31 journal: nan DOI: nan sha: 016e2d00f3a93adfa15d9c8d092e5e4232ba274c doc_id: 652267 cord_uid: wdk9j8ex To combat the recent coronavirus disease 2019 (COVID-19), academician and clinician are in search of new approaches to predict the COVID-19 outbreak dynamic trends that may slow down or stop the pandemic. Epidemiological models like Susceptible-Infected-Recovered (SIR) and its variants are helpful to understand the dynamics trend of pandemic that may be used in decision making to optimize possible controls from the infectious disease. But these epidemiological models based on mathematical assumptions may not predict the real pandemic situation. Recently the new machine learning approaches are being used to understand the dynamic trend of COVID-19 spread. In this paper, we designed the recurrent and convolutional neural network models: vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM model to capture the complex trend of COVID-19 outbreak and perform the forecasting of COVID-19 daily confirmed cases of 7, 14, 21 days for India and its four most affected states (Maharashtra, Kerala, Karnataka, and Tamil Nadu). The root mean square error (RMSE) and mean absolute percentage error (MAPE) evaluation metric are computed on the testing data to demonstrate the relative performance of these models. The results show that the stacked LSTM and hybrid CNN+LSTM models perform best relative to other models. The coronavirus disease 2019 (COVID-19) was identified in Wuhan city of China in December 2019 that arises due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] . It is categorized as an infectious disease and spreads among people through coming in close contact with infected people generally via small droplets due to coughing, sneezing, or talking, and through the infected surface. On March 11, 2020, the World Health Organization (WHO) declared the COVID-19 as a pandemic of infectious disease. In India, the first case of COVID-19 was reported in Kerala on January 30, 2020 and gradually spread throughout India especially in urban area, and India witnessed the first wave of COVID-19. India witnessed the second wave in March 2021, which was much more devastating than the first wave, with shortages of hospital beds, vaccines, oxygen cylinder and other medicines in parts of the country. To fight with the COVID-19, the country has vaccination, herd immunity, and epidemiological interventions as few possible options. In the early stage of COVID-19, India had imposed complete as well as partial lockdown as epidemiological interventions during the first wave that slowed the transmission rate and delayed the peak, and resulted in a lesser number of COVID-19 cases. India is the second most populous country in the world, where 68.84 % and 31.16 % India's population lives in rural areas and urban areas respectively. The population Email addresses: hv4231@gmail.com (Hanuman Verma), saurav.mnnit@gmail.com (Saurav Mandal), akshanshgupta@ceeri.res.in (Akshansh Gupta ) density in northeast India is low in comparison to other states of India. The chance of getting infection depends on the spatial distance between the contacts and low-density population is less prone in comparison to high density population. Individual personal behavior (social distancing, frequent hand sanitation, and wearing a mask, etc.) also plays a key role to control the COVID-19 spread. Prediction of COVID-19 new cases per day will help the administration and planners to take the proper decision and help them in making effective policy to tackle the pandemic situation. The epidemiological models are very helpful to understand the trend of COVID-19 spread and useful in predicting the spread rate of the disease, the duration of the disease, and the peak of the infectious disease. It can be used for short term and long term predictions for new confirmed COVID-19 cases per day that may be used in decision making to optimize possible controls from the infectious disease. In literature, several mathematical models for infectious diseases such Logistic models [2] , generalized growth models [3] , Richards's models [4] , sub epidemics wave models [5] , Susceptible-Infected-Recovered (SIR) model [6] , and Susceptible-Exposed-Infectious-Removed (SEIR) have been introduced. The SIR model is a compartmental model that considers the whole population as a closed population and divides this closed population into susceptible, infected, and recovered compartments. Few infected persons infect some other persons at an average rate R0, known as the basic reproduction number. Recently, some works have been reported in the literature using the SIR and its variants model to predict the COVID-19 outbreak [7, 8, 9, 10, 11] . These epidemiological models are good in understanding the trend of COVID-19 spread but are designed based on several assumptions that would not hold generally on real-life data [12] . It is unreliable due to the complex trend of spread of the infection as it depends on population density, travel, and individual social aspects like cultural and life styles. Therefore, there is a need for deep learning approaches to accurately predict the COVID-19 trends in India. In deep learning, convolutional neural network (CNN) [13] is one form of deep learning architecture for processing data that has a grid like topology. It includes the time series data that can be considered as 1D grid taking samples at regular time intervals and image data considered as 2D grid of pixels. A typical end-to-end CNN network consists of different layers such as convolution, activation, max-pooling, softmax layer etc. Recurrent neural network (RNN) [14] derived from the feedforward neural networks can use their interval states (memory) to process variable length sequences of data suitable for the sequential data. Long Short-Term Memory (LSTM) has been introduced by Hochreiter and Schmidhuber [15] which overcomes the vanishing and exploding gradient problem in RNN and have long dependencies that proved to be very promising for modelling of sequential data. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. For a given input sequence x = (x 1 , x 2 , . . . x T ) from time t = 1 to T , LSTM calculates an output sequence y = y 1 , y 2 , . . . y T , mathematically represented as [15] : From Equation 1 to Equation 6, i, o, f and c represent the input gate, output gate, forget gate and cell activation vector respectively, m depicts hidden state vector also known as output vector of the LSTM unit. W denotes the weight matrix, for example W ix means weight matrix from input gate to input. The stands for element wise multiplication, and b denotes the bias term, whereas g and h are used for activation functions at the input and output respectively. σ represents logistic sigmoid function. LSTM is a method having multiple layers which can map the input sequence to a vector having fixed dimensionality, in which the deep LSTM decodes the target sequence from the vector. This deep LSTM is essential for a recurrent neutral network model except on the input sequence. The LSTM can solve problems with long term dependencies which may be caused due to the introduction of many short term dependencies to the dataset. LSTM has the ability to learn successfully on data having a long range of temporal dependencies because of the time lag between the input and their corresponding outputs [16] . LSTM can be used for predicting time series and it is beneficial for sequential data [17] . Deep learning models such as LSTM and CNN are well suited for understanding and predicting the dynamical trend of COVID-19 spread and have recently been used in prediction by several researchers [18, 19, 20, 21, 22, 23, 24] . Chandra et al. [25] used the LSTM and its variants for ahead prediction of COVID-19 spread for India with split the training and testing data as static and dynamics. LSTMs have been used for COVID-19 transmission in Canada by Chimmula & Zhang [12] and results show the linear transmission in the Canada . Arora et al. [26] performed forecasting of the COVID-19 cases for India using LSTMs variants and categorized the Indian states in different zones based on COVID-19 cases. In this paper, we employ the vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM model to capture the dynamic trend of COVID-19 spread and predict the COVID-19 daily confirmed cases for 7, 14 and 21 days for India and its four most affected states: Maharashtra, Kerala, Karnataka, and Tamil Nadu. To demonstrate the performance of deep learning models, RMSE and MAPE errors are computed on the testing data. The flowchart of the model is represented in Figure 1 . The rest of the manuscript is organized as follows. Section 2, describes the deep learning model along with experimental setup and evaluation metrics. In Section 3, we present the COVID-19 dataset and experimental results and discussions. Finally, the conclusion is made in Section 4. The COVID-19 outbreak trend is highly dynamic and depends on imposing various intervention strategies. To capture the complex trend, in this study, we proceed the following steps during the training, testing and forecasting. • We used early COVID-19 data up to July 10, 2021, and split the COVID-19 time series data into training and testing data by taking the last 20 days data as testing data and remaining data as training data. • To avoid the inconsistency in COVID-19 time series data, the data is normalized in the interval[0,1] using 'Min-MaxScaler' Keras function. • The COVID-19 time series data is reshaped into the input shape data by taking time step (time-lag) or observation window 15 RNN abd CNN approaches viz. vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM have been implemented in Python using Keras module of Tensorflow and consider the prediction by taking univariate approaches. A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units. The encoder is responsible for interpret-ing and reading the input sequence whereas the output encoder has a fixed-length vector [27] . Vanilla LSTM has a property to isolate the effect due to change on the performance variant. So, when vanilla LSTM is used as a baseline it evaluates with all of its variants and allows the isolating effect for the changes made in each of the variants. The performance of vanilla LSTM is reasonably well on various data sets [28] . This vanilla LSTM is kind of art model for different variety of machine learning programs. So, vanilla LSTM neural networks predict with accuracy making most of the long short-term memory when the cases are complicated while operating [29] . Stacked LSTM has more than one LSTM sub-layers that are connected together using various weight parameters. On a single-layer LSTM, stacked LSTM ovelays the hidden layers of LSTM [30] . In stacked LSTM each edge weight corresponds to weight value and the cell is the time unit. The data transformation process performed in stacked LSTM is mathematically shown below, Here, f is the activation function, i next is the input data for the next hidden layer, weight of edge connected to previous output and next layer input is defined in w next n , o m contains output value of one cell and b next contains bias. For feature extraction the stacked LSTM proves to improve the extraction process [31] . Bidirectional Long Short-Term memory (Bi-LSTM) is a deep learning algorithm applied for forecasting the time series data. It is adopted to learn from the framework providing better understanding from the learning context [17] . As Bi-LSTM is a multivariate time series it allows multiple time series dependent which can be designed together to predict the correlations along with the series recorded or captured variables varying simultaneously over time period [32] . Bi-LSTM is a deep learning models for the sequential prediction without much error [33] . It has many more features like handling temporal dependencies along with time series data distributing free learning models and flexibility in modelling non-linear features. In other words, Bi-LSTM is an enhanced version of LSTM algorithm in which it can deal with the combination of two variants having hidden states that allows information to come from the backward layer as well as from the forward layer. The Bi-LSTM is helpful for situation that require context input. It is widely used in classification especially like text classification, sentiment classification and speed classification and recognition and load forecasting. As Bi-LSTM is a deep learning models having capacity to capture non-linearity process and being flexible in modelling time-dependent data; so now-a-days Bi-LSTM have been using for real-time forecasts of the daily events [34] . ED-LSTM (Encoder Decoder) is a network of sequence-tosequence model for mapping a fixed-length input to a fixedlength output. It handles variable length input and output first by encoding the input sequence, then decoded from the representation. This method can compute a sequence having hidden states. In ED-LSTM, the encoder and decoder improved the continuity of learning input and output sequences. It experiences reuse for reading input sequence and writing output sequence many times sequentially. And the times of reuse skill depend on the length of the input and output sequences. ED-LSTM model is so consistent and its outputs are stable, reliable and accurate. It can even effectively mimic the long-term dependence between variables [35] . The advantage of ED-LSTM is that the network of models can be constructed from the model definition which consists of a list of input and outputs. So, the models can be automatically trained from the provided dataset. This advantage of ED-LSTM help to reduce the model construction and training cost [36] . CNN is one of the algorithms in Deep learning that automatically captures and identifies the important features without the need of human supervision [37] . Local connections and shared weights employed in the CNN are useful in extracting features from 2-D input signals such as image signals. Basically, CNN has three kinds of layers: convolution layer, pooling layer and fully connected layer. Convolution layer is primarily associated with the identification of features from raw data. This is achieved by applying filters having predefined size followed by convolution operation. Pooling layer applies a pooling operation that reduces the dimension of feature maps while retaining the important features [38] . Some of the pooling methods are max pooling and average pooling. The fully connected layer or the dense layer generates forecasting after features extracting process. The final fully connected layers have flattened features arising after applying convolution and pooling operations [39, 40] . (I) The convolutional layer in CNN architecture consists of multiple convolutional filters. These filters are also known as kernels. Convolution operation is performed between the raw data that is in the form of a matrix and these kernels that generate an output feature map. The numbers present in the kernel is the kernel weight of the kernel. The initial values of the kernel are random in nature, during the training process the kernel values are adjusted to help in extracting important features from the data. In convolutional operation the CNN input format description is present. In convolution operation let's say in 10 * 10 greyscale image a randomly initialized kernel slides vertically and horizontally and the dot product between them is computed. In 1D-CNN the kernel function moves in one direction only. Similarly, in 2D-CNN and 3D-CNN the kernel function moves in two and three directions respectively. The computed values are multiplied to create a single scalar value. The data processed by the kernel of CNN sometimes may require padding. It is a process of extracting border information from the input data. Padding refers to the adding layers of extra pixels (zeros) to the input data that helps to preserve information present on the borders [38] . (II) Pooling layer: The feature maps generated from the convolutional operations are sub-sampled in the pooling layer. This reduces the large size feature maps to generate smaller feature maps. The pooling layer reduces the dimension of the feature map resulting in reduction in the number of parameters to learn. It also reduces the computation that needs to be performed. There are various types of pooling such as average pooling, max pooling, min pooling, global average pooling (GAP) etc. It may be possible sometimes that the performance output of CNN model decreases because of the pooling layer as it focuses primarily on ascertaining the correct location of a feature rather than focusing on particular features available in the data [37, 39, 40] . (III). Activation function (Transfer function): In a neural network based on the weighted sum of the neuronal input activation function transforms it into output form. It performs mapping of the input to the output depending upon the neuronal input so as to fire a particular neuron or not. Activation functions can be linear or non-linear functions. Some of the activation function used in CNN are described below: (a)Rectilinear Unit (ReLU): The ReLU function converts the input to a piecewise linear function to a positive output otherwise it will output zero. It is one of the common activation functions in most of the neural networks. One of the advantages of using ReLU over other activation functions is that it has lower computational load [38] . Mathematically it is represented as below, (b)Sigmoid: In this the input are real numbers and the output is constrained to be in between zero and one. It is S-Shaped function and is mathematically represented as shown below, (c)Tanh: In Tanh activation function the input is real numbers and output is in between -1 and 1. It is described mathematically as shown below, (IV). Fully Connected layer: In this layer each neuron is fully connected to other neurons of the other layer, hence the name Fully Connected (FC) layer. It is located at the end of the CNN architecture and it forms the last few layers in the network. The final pooling layer that is flattened is the input to the FC layer. Flattening is a process in which a matrix is unrolled at its values to form a vector [38] . Hybrid CNN+LSTM deep learning architecture combines the benefits of both the LSTM and CNN. The LSTM in this hybrid model learns the temporal dependencies that are present in the input data. The CNN is integrated such that it can process high dimensional data. The components of LSTM are input gate, forget gate, output gate, memory cell, candidate memory cell and hidden state [41] . The 1-D CNN finds the important features from temporal feature space using non-linear transformation generated by LSTM. The convolution layers are wrapped with a time-distributed layer in the model and it is ensured that data is transformed appropriately. The layers used in the model are two convolutional layers, max-pooling layer, flatten layer, time-distributed layer, followed by LSTM layers [41, 42] . To demonstrate the relative performance of various deep learning models, the root mean square error (RMSE) and mean absolute percentage error (MAPE) have been computed, which is mathematically defined as: here y i denote the actual confirmed cases,ȳ i is the predicted daily confirmed cases using the deep learning model, and n is the total number of observation under the study. The small value of RMSE and MAPE represents the better performance of that model. In this study, RMSE and MAPE are computed on the test data where the actual and predicted values of various other models are available. Throughout all predictions of 7, 14, and 21 days, we also computed the confidence interval [43] at 95% for the predicted new confirmed COVID-19 cases counts per day. The confidence interval gives a range of values for new cases and it gives the probability with which an estimated interval will contain the true value of the confirmed cases. For the analysis and forecasting of the daily confirmed COVID-19 cases for training and testing the RNN and CNN models are considered. In our study we used Vanilla LSTM, Stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and Hybrid CNN+LSTM to build a map that captures the complex trend in the given sequence of COVID-19 time series data and performs forecasting using these maps. The details are discussed in the following subsections: In this study, daily new COVID-19 cases have been predicted for 7 days, 14 days and 21 days for the whole country (India) and four of its most affected states (Maharashtra, Kerala, Karnataka, and Tamil Nadu) using deep learning approaches. Previous COVID-19 time series data is accessed from COVID-19India.org during January 30, 2020 to July 10, 2021, where numbers of daily confirmed, recovered and deceased cases are publicly available online at https://api.covid19india.org/ documentation/csv/. We use data up to July 10, 2021 as illustrated in Fig. 2(a-e) to train and test the recurrent and convolutional neural network models. The trends of COVID-19 time series data is highly inconsistent in nature and it may be due to the rate of individual infections, number of reporting of the cases, individual behaviour, effect of lockdown, and non-pharmaceuticals measures. India and its states witnessed two waves and new cases count per day during peak of the second wave were much more than the first wave as depicted in Fig.2 . Due to higher consistency in per day count, these data are normalized in the interval of [0,1] using 'MinMaxScaler' of the keras function in the preprocessing step before applying the deep learning models. The 'MinMaxScaler' function normalizes the given time series data (x) using the formula x normal = (x − x min )/(x max − x min ), where x max and x min represents the maximum and minimum value of data (x). After the forecasting of the confirmed cases count per day that lies in the interval [0, 1] it is again retransformed into the corresponding actual number by applying reverse operation using 'inverse_trans f orm keras function. The hyper parameters in vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM models are summarized in Table 1 and Table 2 . To avoid the over-fitting, we regularize the model on the training data using L1 regularizer (bias/kernel) with different settings along with Dropout as shown in Table 1 and Table 2 . Around 20% to 40% neurons are dropped through the Dropout layers. In CNN and hybrid CNN+LSTM, we use the Conv1D layer along with the Table 2 . Throughout the entire experiment 'ReLu' activation function, 'adamax' optimizer and 'MSE' loss function is considered in our study. As tuning the training epochs, we setup the 'EarlyStopping' callback with number of epochs 1000, batch size 64 along with patience=250. This setup checks the performance of the respective model on train and validation datasets and stops the training if it looks like that if the model is starting to over learn or over fit. The learning algorithm is stochastic in nature therefore the results may be varying in nature [44] . To address this issue, we have run each deep learning model up to 10 times and saved the better model and noted their corresponding performance results in our experiment. In this section, we discuss the prediction performance of deep learning models for India and four it states: Maharashtra, Kerala, Karnataka, and Tamil Nadu, individually in the following subsections: India is the second most populous country in the world, it may lead to higher threats because of the spread of COVID-19. The daily confirmed cases in India from Jan 30, 2020 to July 10, 2021 are depicted in Fig. 2(a) . It is observed that the new confirmed cases per day are highly inconsistent. India witnessed two waves, in second waves around 400,000 new cases were reported per day. To address this issues, we train and test the vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and CNN+LSTM models on the India normalized time series data to capture the real trend with setting the hyper parameter, shown in Table 1 and Table 2 , with manual tuning of hyper parameters. The predicted new cases of COVID-19 for 7, 14, and 21 days are calculated from July 10, 2021 using various recurrent and CNN model and determine the corresponding performance metrics: RMSE and MAPE as presented in Table 3 . RMSE and MAPE are computed for the actual and predicted daily confirmed case from June 21, 2021 to July 10, 2021 on the test data. From Table 3 , it can be seen that the RMSE and MAPE (7.57% -11.36%) are comparatively smaller for the stacked LSTM and hybrid CNN+LSTM. In some cases RMSE and MAPE (7.36%-12.96%) is less for Bi-LSTM and ED-LSTM on the test data but the predicted new cases per day is far from the actual cases (Fig. 3) . Bi-LSTM and ED-LSTM models have the over-fitting problem. The predicted and actual (red color) cases for India for 7 days (up to July 17, 2021), 14 days (up to July 24, 2021) and 21 days (up to July 31, 2021) are shown in Figs. 3(a)-3(c) . It can be observed that the stacked LSTM and hybrid CNN+LSTM provide better prediction as forecasting in count cases is close to actual count per day. The predicted new cases for 7, 14 and 21 days with various models along with 95% level confidence intervals are shown in Table 4 . In our study, for India, we found that stacked LSTM and hybrid CNN+LSTM performed best in terms of prediction consistency among all six deep learning models. Maharashtra was one of the worst-affected states in India during the second wave with COVID-19. The new cases count per day is depicted in Fig. 2(b) , which shows that the number of daily cases might count nearly 70,000 in the second waves and outbreak scenario being highly dynamic. To capture the dynamic trend of data, we train and test the vanilla LSTM, stacked LSTM, ED-LSTM, BiLSTM, CNN, and hybrid CNN+LSTM model on Maharashtra time series data with setting the hyper parameters, illustrated in Table 1 and Table 2 , and computed RMSE and MAPE on test data, presented in Table 3 . Further forecasting of confirmed new cases per day for 7 days (up to July 17, 2021), 14 days (up to July 24, 2021) and 21 days (up to July 31, 2021) from July 10, 2021 are shown in Table 4 . Fig. 4 (a)-(c) illustrates the predicted and actual cases using deep learning models. In 7 days prediction, the stacked LSTM (MAPE=15.55%) and Bi-LSTM (MAPE=9.95%) forecasts value close to the actual values whereas in 14 days prediction the Bi-LSTM and ED-LSTM forecasts cases close to actual cases. Table 4 shows 95% confidence interval for the predicted confirmed cases per day up to July 31, 2021. We train and test the different recurrent and convolution neural network models: vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and CNN+LSTM models on Kerala COVID-19 early data from Mar 14, 2020 to Jul 10, 2021 (Fig. 2(c) ) with setting the hyper parameters (Table 1 and Table 2 ) to capture the trend of daily confirmed cases and computed RMSE and MAPE (Table 3) on test data (last 20 days data). The RMSE and MAPE (=9.55%) values for vanilla LSTM is smallest on test data among six models. Using different learning models the prediction of 7 days (up to July 17, 2021), 14 days (up to July 24, 2021) and 21 days (up to July 31, 2021) has been done as shown in Table 4 and their comparison is illustrated in Figs. 5(a)-(c). Due to the highly dynamic trend (zigzag) of the Kerala time series data it is difficult to capture its trend. In 7 and 14 days prediction, CNN+LSTM forecasts the confirmed cases per day close to the actual cases counts per day and in 21 days prediction the stacked LSTM forecasting value is close to actual values. The time series data of Karnataka depicted in Fig. 2 (d) shows the dynamic trend of data during the first and the second wave. To address these issues and capture the trend of new cases count per day, vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM models are trained and tested on Karnataka data with the hyper parameters shown in Table 1 and Table 2 . Further prediction is performed for 7 days (up to July 17, 2021), 14 days (up to July 24, 2021) and 21days (up to July 31, 2021) as displayed in Table 4 . The comparisons between the predicted and actual case by different models are illustrated in Figs. 6(a)-(c). In 14 days prediction stacked LSTM gives less MAPE (=13.43%) error among other models and also predicted new cases per day close to the actual cases, whereas in 7 days prediction the hybrid CNN+LSTM provides predicted cases per day close to actual cases. The ED-LSTM performance is better in 21 days prediction but in 14 days prediction the predicted cases are far from the actual cases that may be because of over fitting. The new cases count per day in Tamil Nadu is depicted in Fig. 2 (e) which shows that the number of daily cases might count nearly 35,000 in the second wave and outbreak scenario being inconsistent in nature. Further forecasting of new confirmed cases per day for 7 days (up to July 17, 2021), 14 days (up to July 24, 2021) and 21 days (up to July 31, 2021) from Table 4 . The comparison of predicted and actual cases per day for 7, 14, and 21 days using deep learning models are illustrated in Fig. 7 (a)-(c). All models except ED-LSTM are able to capture the declining cases in Tamil Nadu. In 14 and 21 days prediction, forecasting of the case counts per day by vanilla LSTM, stacked LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM models are close to actual cases (Fig. 7(a)-(c) ). The COVID-19 outbreak is a potential threat due to its dynamical behaviour and more threatening in a country like India because it is very densely populated. The researchers are engaged in seeking new approaches to understand the COVID-19 dynamics that will overcome the limitation of existing epidemiological models. In this study, we designed the vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN+LSTM model to capture the complex dynamical trends of COVID-19 spread and perform forecasting of the COVID-19 confirmed cases of 7, 14, 21 days for India and its four most affected states: Maharashtra, Kerala, Karnataka, and Tamil Nadu. The RMSE and MAPE errors on the testing data are computed to demonstrate the relative performance of the deep learning models. The predicted COVID-19 confirmed cases of 7, 14, and 21 days for entire India and its states: Maharashtra, Kerala, Karnataka, and Tamil Nadu along with confidence intervals results shows that predicted daily confirmed cases by most of the models studied are very close to actual confirmed cases per day. The stacked LSTM and hybrid CNN+LSTM models perform better among the six models. These accurate predictions can help the governments to take decisions accordingly and create more infrastructures if required. Clinical features of patients infected with 2019 novel coronavirus in wuhan, china A theory of growth Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts A flexible growth function for empirical use A novel sub-epidemic modeling framework for short-term forecasting epidemic waves A contribution to the mathematical theory of epidemics A time-dependent sir model for covid-19 with undetectable infected persons A sir model assumption for the spread of covid-19 in different communities Estimating the parameters of susceptible-infected-recovered model of covid-19 cases in india during lockdown periods Covid-19 outbreak prediction with machine learning Analysis of covid-19 cases in india through machine learning: A study of intervention Time series forecasting of covid-19 transmission in canada using lstm networks Generalization and network design strategies, Connectionism in perspective Learning representations by back-propagating errors Long short-term memory Sequence to sequence learning with neural networks Nouri-Moghaddam, Modeling and forecasting spread of covid-19 epidemic in iran until Time series forecasting of covid-19 using deep learning models: India-usa comparative case study Time series prediction for the epidemic trends of covid-19 using the improved lstm deep learning method: Case studies in russia, peru and iran Covid-19 patient count prediction using lstm Prediction of covid-19 trend in india and its four worst-affected states using modified seird and lstm models Comparative study of machine learning methods for covid-19 transmission forecasting Forecasting of covid-19 cases using deep learning models: Is it reliable and practically significant? Forecasting covid-19 cases: A comparative analysis between recurrent and convolutional neural networks Deep learning via lstm models for covid-19 infection forecasting in india Prediction and analysis of covid-19 positive cases using deep learning models: A descriptive case study of india How to develop lstm models for time series forecasting Lstm: A search space odyssey Remaining useful life estimation of engineered systems using vanilla lstm neural networks A stacked lstm for atrial fibrillation prediction based on multivariate ecgs, Health information science and systems A review of recurrent neural networks: Lstm cells and network architectures Predicting covid-19 cases using bidirectional lstm on multivariate time series Predictions for covid-19 with deep learning models of lstm, gru and bi-lstm Deep learning methods for forecasting covid-19 time-series data: A comparative study Exploring a long short-term memory based encoder-decoder framework for multi-step-ahead flood forecasting An encoder-decoder lstm-based empc framework applied to a building hvac system Recent advances in convolutional neural networks Understanding of a convolutional neural network Review of deep learning: Concepts, cnn architectures, challenges, applications, future directions Recurrent convolutional neural network regression for continuous pain intensity estimation in video A hybrid cnn-lstm model for forecasting particulate matter (pm2. 5) A hybrid cnn-lstm algorithm for online defect recognition of co2 welding Fundamental of mathematical statistics. sultan chand & sons Better deep learning: train faster, reduce overfitting, and make better predictions The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.