key: cord-0943558-g19sygae authors: Bhimala, Kantha Rao; Patra, Gopal Krishna; Mopuri, Rajasekhar; Mutheneni, Srinivasa Rao title: Prediction of COVID‐19 cases using the weather integrated deep learning approach for India date: 2021-04-20 journal: Transbound Emerg Dis DOI: 10.1111/tbed.14102 sha: fe622b41be62ea251435994a0b5781197be6ba9d doc_id: 943558 cord_uid: g19sygae Advanced and accurate forecasting of COVID‐19 cases plays a crucial role in planning and supplying resources effectively. Artificial Intelligence (AI) techniques have proved their capability in time series forecasting non‐linear problems. In the present study, the relationship between weather factor and COVID‐19 cases was assessed, and also developed a forecasting model using long short‐term memory (LSTM), a deep learning model. The study found that the specific humidity has a strong positive correlation, whereas there is a negative correlation with maximum temperature, and a positive correlation with minimum temperature was observed in various geographic locations of India. The weather data and COVID‐19 confirmed case data (1 April to 30 June 2020) were used to optimize univariate and multivariate LSTM time series forecast models. The optimized models were utilized to forecast the daily COVID‐19 cases for the period 1 July 2020 to 31 July 2020 with 1 to 14 days of lead time. The results showed that the univariate LSTM model was reasonably good for the short‐term (1 day lead) forecast of COVID‐19 cases (relative error <20%). Moreover, the multivariate LSTM model improved the medium‐range forecast skill (1–7 days lead) after including the weather factors. The study observed that the specific humidity played a crucial role in improving the forecast skill majorly in the West and northwest region of India. Similarly, the temperature played a significant role in model enhancement in the Southern and Eastern regions of India. Severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) that causes the coronavirus disease 2019 (COVID-2019) first emerged in Wuhan, China in early December 2019 Shen et al., 2020) . Since then the disease has quickly spread around the world and established local transmission in many countries including the Americas, Europe, Africa and Asia. This rapid spread of the COVID-19 cases may be due to a lack of proper information about disease etiology and transmission patterns during the early stage of the epidemic (Zhong et al., 2020) . On 7 January 2020, this novel strain of SARS CoV-2 was isolated and confirmed the circulation in the populace and causes COVID-19. On 30 January 2020, WHO (World Health Organisation) declared the COVID-19 outbreak as a public health emergency of international concern (WHO, 2020a) and confirmed as a global pandemic on 11 March 2020 (Cucinotta & Vanelli, 2020) . The pandemics disrupt human life, public healthcare systems and economies are unprecedented, and impacts will continue till the vaccine is developed. During the first wave of the pandemic, many countries have been locked down and non-essential services were shut down and adopted social distancing and face mask-wearing made compulsory. As of 22 October 2020, more than 40 million COVID-19 cases and 1.1 million deaths were reported globally (WHO, 2020b). SARS-CoV-2 belongs to the genus Betacoronavirus which includes the SARS CoV-1, Middle East Respiratory Syndrome (MERS), and two other human coronaviruses (HCoV-OC43 and HCoV-HKU1) (Kissler et al., 2020) . The SARS-CoV-2 spread faster than the two of its ancestor viruses SARS-CoV-1 and MERS may be due to high transmission rates produced by asymptomatic carriers (Bai et al., 2020; Vellingiri et al., 2020) . are the most common causes of the common cold and respiratory illness outbreaks during winter time in temperature regions (Killerby et al., 2020; Neher et al., 2020; Su et al., 2016) . Similarly, the SARS-COV 2 is closely related to bat-derived viruses bat-SL-CoVZC45 and bat-SL-CoVZXC21 and distinct from SARS-CoV-1 (−79% similarity) and MERS-CoV (−50% similarity) (Jiang et al., 2020; Lai et al., 2020; Liu et al., 2020) . SARS-CoV-2 is deadly because the case fatality rates are much higher than influenza (Fauci et al., 2020; de Wit et al., 2016) . During the initial period of outbreak the case fatality rate (CFR) was 15%, subsequently, with more data emerged, the CFR decreased to between 43% and 110%, and later to 34% (Chen, Zhou, et al., 2020; Wang Det al., 2020; WHO, 2020b) and currently, the CFR is 2.75% (calculated based on COVID-19 cases and deaths reported worldwide as of 22 October 2020) (WHO, 2020b) . Along with other countries the COVID-19 cases are also reported in India. The first case of COVID-19 was identified on 30 January 2020, in Kerala state, India, and it was imported from China (Rawat, 2020 October 2020, 7.76 million COVID-19 cases and 1.17 million deaths were reported in India (My Gov; https://www.mygov.in). Environmental factors can affect the epidemiological transmission of many infectious diseases. Several studies have revealed that climate and weather conditions could influence the spatial and temporal distribution of infectious diseases (Dhara et al., 2013; Shuman, 2010) . The coronaviridae family viruses SARS CoV-1 and MERs CoV are also shown seasonal variations and prefer low temperature and humidity (Casanova et al., 2010) . Similarly, at the early stage of the COVID-19 pandemic, researchers have reported that the temperature had a positive association and humidity had a negative association with the cases in many regions of the World (Bashir et al., 2020; Briz-Redón et al., 2020; Chen, Liang, et al., 2020; Liu et al., 2020; Ma et al., 2020; Oliveiros et al., 2020; Sahin, 2020; Wang, Hu, et al., 2020; Wang, Tang, et al., 2020) . However, a negative linear relationship between temperature and daily cumulative cases of COVID-19 is also observed (Prata et al., 2020) . Many studies have suggested that the COVID-19 spread is more in the cold and temperate climate than the warm and tropical climate, consistent with the behaviour of a seasonal flu respiratory virus (Bloom-Feshbach et al., 2013) . Machine learning and deep learning techniques are the branches of Artificial Intelligence (AI) and provide powerful predictive capabilities and superiority over conventional statistical modelling (Beam & Kohane, 2018; Miguel-Hurtado et al., 2016; Singal et al., 2013) . Despite the high predictive power these algorithms are not widely exposed in public health data analysis. Here, we aim to apply a deep learning algorithm on integrated data sets (epidemiology and climate data) and deployed the multivariate long short-term memory (LSTM) modelling framework used to forecast COVID-19 trends in India. Similarly, the LSTM has been used successfully to forecast dengue and influenza (Leonenko et al., 2017; Nadda et al., 2020) . Moreover, previous studies have used relative humidity and absolute humidity to understand their role in COVID-19 transmission. But, studies on the influenza virus show that specific humidity is an important factor for disease transmission. Hence the present study used the specific humidity along with other climatic factors to understand COVID-19 transmission and forecast in India. All 28 states and 08 Union Territories of India covering latitude July 2020. Similarly, the daily meteorological parameters of a specified period consist of temperature (minimum, maximum and mean) and specific humidity (SH) extracted from NCEP/NCAR reanalysis data (Kalnay et al.,1996) (https://psl.noaa.gov/). To understand the weather impact on COVID-19 cases, the crosscorrelation analysis was carried out to identify the similarities between the lagged meteorological parameters (X) and daily count of COVID-19 cases (Y) for different states in India during the period 1 April to 31 July 2020. The cross-correlation coefficients analysis helps identify whether the antecedent (lagged) meteorological parameters are useful predictors for modelling the COVID-19 cases over different states in India. The cross-correlation coefficient values are computed as: where t, d, N represent the time in days, lag in days (0-14), and the total number of days (122) in time series data, respectively. The block diagram of a basic multi-input LSTM network and the memory transformation between each cell of LSTM was presented in Figure 1a and b. The LSTM cell consists of three gates: input gate (i t ), forget gate (f t ), and output gate (o t ) with different functionality ( Figure 1c ). The forget gate is responsible for forgetting information that is not required anymore, while the input gate is used for adding new useful information. The output gate updates the hidden states at every time step. Each gate is a feed-forward neural network with many hidden units as shown in Figure 1d . The mathematical representation of LSTM is given below in Equations (1)-(5) (Hochreiter & Schmidhuber, 1997) . where σ, i, f, o, and g represent the sigmoid function, input gate, forget gate, output gate, and un-gated input transformation, respectively. The are represented in vectors, and S t-1 represents the cell state of the previous time step. The present study utilized both univariate and multivariate LSTM models for forecasting the daily COVID-19 cases for different states in India. Hence, the time-series data (1 April to 31 July 2020) selected for the study was divided into two parts, the first three months (April-June) data utilized for training, and the last one-month (July) data was utilized for testing purposes. The control experiment (CTL) was conducted with the univariate LSTM model and other four experiments (CTL_SH, CTL_Tmax, CTL_Tmin, CTL_Tmean) were (Table 1 ). The univariate and multivariate LSTM models were optimized with a minimum error method (considering different hyper-parameters, such as the number of units in the hidden layer, the number of hidden layers, and so on.) and utilized for forecasting purposes. Hence, state-level COVID-19 cases are forecasted (1-day forecast window) for July 2020 with different initial condition data (lag:1-14 days; Table 1) using univariate and multivariate LSTM models and evaluated with observed data. Further, we have also generated the forecasts with a different combination of the weather parameters and evaluated them with the observed data of high prevalence states for COVID-19 in India. The relative error is the ratio between the absolute error and the absolute value of the observation. The average relative error in the forecasting of COVID-19 daily cases for July (31 days) is calculated as where X (m,t) and X (o,t) are the model forecasted and observed COVID-19 cases for the day (t). The computed average relative error was utilized to verify the performance of each model with different lags in predicting the future COVID-19 cases for the selected states in India. To understand the weather effect on COVID-19 cases, the lag (0-14 days) correlation coefficients (CC) computed between daily COVID-19 cases and surface meteorological parameters (SH, Tmax, Tmin, Tmean) for the period 01 April to 31 July 2020. Similarly, the study considered 14 days lag correlations due to the symptoms of COVID-19 that will appear after the incubation period which is typically ranging between 1 and 14 days. The correlation coefficient values for lag1, lag7, and lag14 over different states of India shown in Figure 4 . The correlation maps describe that the specific humidity and Northeast India but a weak negative association was found over the South India region (Figure 4 ). Spatial maps of monthly cumulated COVID-19 cases over different states in India during pre-monsoon (April and May) and monsoon season (June and July) of the year 2020 The present study utilized the three months (01 April to 30 June 2020) data for training and one-month data (01 July to 31 July 2020) for testing the model. Figure 5 ). The univariate LSTM captured the trend very well for both estimated and observed cases in these states ( Figure 6 ). However, the major disadvantage of the univariate model is that the forecast skill is decreased with long-term lead data. Andhra Pradesh and Karnataka are COVID-19 affected states in South India, the cases were very low during the pre-monsoon season, whereas the virus transmission was so rapid in monsoon season and more than 0.1 million cases reported in July from these states. The univariate LSTM model which is optimized with the confirmed case data performed well (relative error <15% for Lag1) in capturing the exponential growth of the pandemic, whereas the multivariate model optimized with the weather data underestimated the confirmed cases in these states. The LSTM model has shown its capability not only in increasing the trend but also in capturing the decreasing trend in Delhi (relative error =15%). Similarly, the multivariate LSTM model optimized with minimum temperature has shown slight improvement than univariate LSTM in lead 2, 3, and 4 days lead forecasts in Delhi (Figure 5c ). It is also observed that the The states (Maharashtra, Madhya Pradesh, Gujarat, Rajasthan, Haryana, and Punjab) located in West, Northwest India, shown F I G U R E 3 Spatial-temporal variation of surface meteorological parameters (2m-specific humidity, 2m-mean temperature, 2m-maximum temperature, and 2m-minimum temperature) during the pre-monsoon and monsoon season over India F I G U R E 4 Correlation between confirmed COVID-19 cases and meteorological parameters (2m-specific humidity, 2m-mean temperature, 2m-maximum temperature, and 2m-minimum temperature) during the period 01 April to 31 July 2020 excellent forecasting skill for the multivariate LSTM model (CTL_SH; model optimized with the specific humidity and COVID-19 cases) compared to the univariate LSTM model. It was also observed that the correlation coefficient between specific humidity and COVID-19 cases was significant in these regions. Moreover, the study shows that the forecasting skill of the model was improved with the lagged specific humidity (lag1-lag7) over these regions and it is a significant sign for medium-range forecasting (Figure 8 ). Among all the states, the state of Maharashtra reported the highest number of COVID-19 cases in India. The multivariate LSTM model (CTL_SH) with specific humidity shown better performance (relative error <8%) with lag7 data (Figure 7a) . Similarly, the forecasting plot (with one-week advance data) shows that the model with other weather variables (CTL, CTL_Tmax, CTL_Tmin, and CTL_ Tmean) were overestimating the daily cases whereas the specific humidity (CTL_SH) followed the observed trend and close to the observed data (Figure 8a) . Similarly, the forecast skill was adequate with the specific humidity for the states of Gujarat (lag1), Madhya Pradesh (lag3), Rajasthan (lag3), Haryana (lag1), and Punjab (lag5) ( Figure 8b-f ). In the case of high humid regions (Kerala, Tamil Nadu, and West Bengal) the forecast skill is improved with the multivariate LSTM model which is optimized with the temperature data ( Figure 9 ). The forecast skill was outperformed with lead 1 (relative error <10%) for Tamil Nadu and West Bengal states and the skill is improved with the maximum and mean temperature. However, in Kerala, the forecast skill was slightly low (relative error between 20% and 30%) with all variables, and a slight improvement was observed in the model which was optimized with the minimum temperature. The forecast plot clearly shows that the temperature-based LSTM models close to the observations compare to the humidity-based model in these humid states (Figure 9 ). Skill (Average relative error) of univariate (CTL) and multivariate (CTL_SH, CTL_Tmax, CTL_Tmin, CTL_Tmean) LSTM models during the test period (1 July to 31 July 2020) for the states of Andhra Pradesh, Karnataka, Delhi, Bihar, Odisha and Uttar Pradesh. Where L1 to L14 represents the 1 to 14 days of lag data utilized for forecasting of the next day COVID-19 cases The COVID-19 cases started during the winter season (the first case reported on 30 January 2020) and the maximum number of cases were reported over Maharashtra and Kerala before the national wide lockdown (25 March 2020) implemented in India. The virus transmission was so rapid after the onset of the monsoon and the maximum number of positive cases were reported from Maharashtra, Karnataka, Andhra Pradesh, Tamil Nadu, Uttar Pradesh, Kerala, Delhi and West Bengal. Based on the earlier studies, the RNN based LSTM models have been shown an adequate skill in short-range (one day lead) forecasting of COVID-19 cases (Arora et al., 2020; Shastri et al., 2020) . Hence, the present study developed weather-integrated multivariate LSTM models to improve prediction skills in short to long-range forecasting of daily cases of COVID-19 over different states in India. The output of our proposed model can help planners and health authorities to implement appropriate control measures. The state-wise predictions will help the public health authorities to balance the disease load which medical facilities can take, and this would also help to resume the economic activities otherwise it may create livelihood challenge for the people. During the early stage of the pandemic, Wu et al. (2020) reported that the humidity and temperature affect COVID-19 cases. The initial understanding is that the daily new cases have shown reduction with an increase in temperature (1°C increase associated with a 3.08% reduction) and humidity (1% increase associated with a 0.85% reduction). Lin et al. (2020) also studied the temperature and humidity effect on COVID-19 transmission in the Asian countries and observed that the high relative humidity with low-temperature increases the COVID-19 transmission, and high humidity with high temperature reduce the COVID-19 transmission. Similarly, to understand the impact of weather on the survival of coronavirus, Dbouk and Drikakis, (2020) conducted a study with heat and mass transfer correlations and found that the reduction in coronavirus viability under low humidity and high-temperature condition. They also found that the high relative humidity increases the airborne virus viability in any environmental temperature conditions. COVID-19 transmission rates are mainly depending on the evaporation rate of the contaminated saliva droplets which is released from the infected person to the surrounding environment (Dbouk & Drikakis, 2020) . The evaporation rate mainly depends on humidity, temperature and wind speed. The contaminated droplets are more resistant to evaporation when the relative humidity is close to the saturation point, which will allow the contaminated droplet cloud to move longer distances from the source (Dbouk & Drikakis, 2020) . A recent study revealed that the droplets (released from the infected person while speaking) size larger than 50 µm fall to the ground very fast, whereas the droplet less than this size slowly reduce their radii F I G U R E 6 Time series data of COVID-19 cases forecasted by univariate (CTL) and multivariate (CTL_SH, CTL_Tmax, CTL_Tmin, CTL_ Tmean) LSTM models during the test period (1 July to 31 July 2020) for the states of Andhra Pradesh, Karnataka, Delhi, Bihar, Odisha and Uttar Pradesh based on the evaporation rate of the surrounding environment and remain airborne for a longer duration (Netz & Eaton, 2020) . Hence, the higher (lower) relative humidity increase (decrease) the airborne virus viability during the calm wind conditions and possible pathway for acceleration in a COVID-19 disease outbreak. To understand the COVID-19 disease transmission over different states in India, we have analysed the potential evaporation data during pre-monsoon and monsoon seasons and presented the spatiotemporal values in Figure 10 . At the early stage of the pandemic (pre-monsoon season), the maximum number of cases were reported from Maharashtra, Gujarat, Rajasthan, Delhi and Uttar Pradesh (Central, north, west, and north-west India) but the disease transmission was very low (monthly cumulative cases <20,000) during the pre-monsoon season. The potential evaporation rates (>500 W/ m 2 ) were very high in central, north, west and northwest India regions during the pre-monsoon season due to the high maximum temperatures (>40°C) and low specific humidity (<0.01 kg/kg) for these regions (Figures 3 and 9) . The virus viability and travel distance may be low due to the high maximum temperatures and low specific humidity. The national wide lockdown and the unfavourable weather conditions during the pre-monsoon season reduced the disease transmission over central, west, and north-west states in India. The potential evaporation rates were slowly reduced in June (after monsoon onset) and reported very low values (<200 W/m 2 ) during July in the south, east, and northeast India regions. These low evaporation rates due to low temperatures and high specific humidity increased the virus viability in the atmosphere (aggravation of airborne transmission) may be the possible reason for the significant increase of COVID-19 cases in the South India ( Figure 10 ). Our results suggested that the skill of the univariate LSTM model which is optimized with confirmed COVID-19 time series data was outperformed for highly affected states like Andhra Pradesh, Karnataka, Uttar Pradesh, Delhi, Bihar and Odisha. It was also noticed that the skill of the univariate model is good F I G U R E 9 Skill (Average relative error) of univariate (CTL) and multivariate (CTL_SH, CTL_Tmax, CTL_Tmin, CTL_Tmean) LSTM models during the test period (1 July to 31 July 2020) for the states of Tamil Nadu, West Bengal, and Kerala. Where L1 to L14 represents the 1 to 14 days of lag data utilized for forecasting of the next day COVID-19 cases F I G U R E 1 0 Spatial-temporal variation of potential evaporation rate (W/m 2 ) during pre-monsoon and monsoon season over India for the year 2020 in short-range forecasting (lag1) and the skill is decreasing with increasing lead period. The major findings of the study explained that the medium range (1-7 days lead) forecasting skill has shown adequate skill in some of the states in India when the LSTM models are integrated with time-series weather data including specific humidity and temperature. The results show that the developed multivariate LSTM models optimized with specific humidity (CTL_SH) shown adequate skills in the medium-range forecast of daily COVID cases over the states located in the west and northwest India region. It was also observed that the de- The authors declare no competing financial interests exist. The authors declare that an ethical statement is not applicable because the case information has been gathered. The data used in this study are available from the corresponding author upon request. Srinivasa Rao Mutheneni https://orcid. org/0000-0003-3263-3905 Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India Presumed asymptomatic carrier transmission of COVID-19 Correlation between climate indicators and COVID-19 pandemic in (p. 138835). Science of The Total Environment Big data and machine learning in health care Latitudinal variations in seasonal activity of influenza and respiratory syncytial virus (RSV): A global comparative review A spatio-temporal analysis for exploring the effect of temperature on COVID-19 early evolution in Spain Effects of air temperature and relative humidity on coronavirus survival on surfaces Roles of meteorological conditions in COVID-19 transmission on a worldwide scale Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study WHO declares COVID-19 a pandemic On respiratory droplets and face masks SARS and MERS: Recent insights into emerging coronaviruses Climate change & infectious diseases in India: Implications for health care providers Covid-19 -Navigating the Uncharted Severe acute respiratory illness surveillance for coronavirus disease Long short-term memory Indian Council of Medical Research Laboratory surveillance for SARS-CoV-2 in India: Performance of testing & descriptive epidemiology of detected COVID-19 An emerging coronavirus causing pneumonia outbreak in Wuhan, China: Calling for developing therapeutic and prophylactic strategies The NCEP/NCAR 40-year reanalysis project Human coronavirus circulation in the United States Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges Influenza peaks forecasting in Russia: Assessing the applicability of statistical methods Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia Containing the spread of coronavirus disease 2019 (COVID-19): Meteorological factors and control strategies Impact of meteorological factors on the COVID-19 transmission: A multicity study in China Effects of temperature variation and humidity on the death of COVID-19 in Wuhan Comparing machine learning classifiers and linear/logistic regression to explore the relationship between hand dimensions and demographic characteristics Prevalence of SARS-CoV-2 infection in India: Findings from the national serosurvey Dengue Fever Detection using Long Short-term Memory Neural Network Potential impact of seasonal forcing on a SARS-CoV-2 pandemic Physics of virus transmission by speaking droplets Role of temperature and humidity in the modulation of the doubling time of COVID-19 cases Temperature significantly changes COVID-19 transmission in (sub) tropical cities of Brazil. The Science of the Total Environment Coronavirus in India: Tracking country's first 50 COVID-19 cases; what numbers tell Time series forecasting of petroleum production using deep LSTM recurrent networks Impact of weather on COVID-19 pandemic in Turkey Time series forecasting of Covid-19 using deep learning models: India-USA comparative case study Novel coronavirus infection in children outside of Wuhan Global climate change and infectious diseases Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma Epidemiology, genetic recombination, and pathogenesis of coronaviruses COVID-19: A promising cure for the global panic Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan High temperature and high humidity reduce the transmission of COVID-19 /?gclid =EAIaI QobCh MI696 d9t7H 7AIVW X8rCh 0kQg2 HEAAY ASAAE gLAdfD_BwE. Accessed on October 6Q IVV w4rCh 0XqwA dEAAY ASAAE gLWBvD_BwE Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries Early prediction of the 2019 Novel Coronavirus Outbreak in the Mainland China based on simple mathematical model Prediction of COVID-19 cases using the weather integrated deep learning approach for India