key: cord-0544269-0ofq8xil authors: Tiwari, Animesh; Gupta, Rishabh; Chandra, Rohitash title: Delhi air quality prediction using LSTM deep learning models with a focus on COVID-19 lockdown date: 2021-02-21 journal: nan DOI: nan sha: 340498bc40a40035d50af1e36d6b52e9fe65c34d doc_id: 544269 cord_uid: 0ofq8xil Air pollution has a wide range of implications on agriculture, economy, road accidents, and health. In this paper, we use novel deep learning methods for short-term (multi-step-ahead) air-quality prediction in selected parts of Delhi, India. Our deep learning methods comprise of long short-term memory (LSTM) network models which also include some recent versions such as bidirectional-LSTM and encoder-decoder LSTM models. We use a multivariate time series approach that attempts to predict air quality for 10 prediction horizons covering total of 80 hours and provide a long-term (one month ahead) forecast with uncertainties quantified. Our results show that the multivariate bidirectional-LSTM model provides best predictions despite COVID-19 impact on the air-quality during full and partial lockdown periods. The effect of COVID-19 on the air quality has been significant during full lockdown; however, there was unprecedented growth of poor air quality afterwards. The global human population has risen by more than four times during the last century [1] . Research show that the major growth in population is attributed to the metropolitan areas in the less developed regions around the globe [2, 3] . The consequence of these increased levels of growth in poorly developed states is low air quality. We note that 80% of global cities [3] and 98% of cities in middle-income countries surpass the proposed levels of air quality [4, 5] . Increase in air pollution results in economic losses, reduced visibility, contributes to faster climate change that contribute to extreme weather conditions, millions of premature deaths annually [6] . The major factor in air pollution is the anthropogenic fine particulate matter (PM); i.e. PM2.5 (particles with an aerodynamic diameter shorter than 2.5 micrometer) [7, 8, 9, 10] . Despite the concentrations of PM2.5 being two to five times higher in developing countries, most of the air quality considerations and estimations are analyzed for developed countries [11] . Delhi is one of the most prominent in terms of growing cities has an estimated population of more than 19.3 million [12] . The population density and growth in the last few decades and rapid industrial expansion led to massive air pollution to hazardous levels and thus failed in providing people with one of the primary life amenities, quality air. The World Economic Forum recently reported India having 6 of the world's 10 most polluted cities with Delhi has one of most polluted [13] . Research Email addresses: animesh18a@iitg.ac.in (Animesh Tiwari), rishabhgupta05@gmail.com (Rishabh Gupta), rohitash.chandra@sydney.edu.au (Rohitash Chandra) shows that infected air is one of the prominent causes of premature deaths [14] and the average life span reduces due to increasing rates of air contamination [15] . It is well known that apart from industrial pollution, agricultural fires are one of the significant contributors of air pollution in Delhi [16] . Apart from strategies that focus on management of industrial waste [17] , it is important to model and forecast the air-quality both in short and long-term. Machine learning methods have been promoting for forecasting temporal sequences and their application to air-quality forecasting has gained attention recently [18, 19, 20] . Forecasting models can be used to develop strategies to evaluate and alarm the general public for future hazardous levels of air quality index. Forecasting models for air pollution concentrations can be broadly classified into two major categories; simulation based and data driven approaches such as statistical or machine learning methods. Simulationbased method incorporates physical and chemical models for generating meteorological and background parameters to simulate emission, transport and chemical transformation of air pollution [21, 22] ]. However, they suffer from numerical model uncertainties and due to the lack of data, the parameterization of aerosol emissions is restricted [23] ]. Data driven approaches exploit statistical and machine learning techniques to detect patterns between predictors and dependent variables in temporal sequences [24, 25, 26, 27, 28] . Machine learning methods can be used to identify the exposures relevant to health outcomes of interest within high-dimensional data [29] . Advances in deep learning methods give further motivations for application to domain of air-quality prediction. Recurrent neural networks (RNNs) are prominent deep learning models [30] suited for modelling temporal sequences, espe-cially those involving long-term dependencies [31, 32, 33, 30, 34] . Long Short Term Memory networks (LSTMs) were developed [33] to address limitations in learning long-term dependencies in sequences by canonical RNNs [35, 36] . Gated Recurrent Unit (GRU) [37, 38] networks are more simpler to implement but provides similar performance than LSTMs. Bidirectional RNNs connect two hidden layers of opposite directions to the same output where the output layer can get information from past and future states simultaneously [39] . This led to bidirectional-LSTMs for phoneme classification [40] which performed better than standard RNNs and LSTMs and has the potential to be used for forecasting air-pollution time series. The coronavirus disease 2019 (COVID- 19) is an infectious disease [41, 42, 43] which became a global pandemic [44] with major impact to our lifestyle. COVID-19 forced many countries to close their borders and enforce a partial or full lock down with devastating impact on the world economy [45, 46, 47] . There has been studies relating the effect of environment and air pollution on COVID-19 and vice versa [48, 49, 49] with studies regarding China [50] , Kazakhstan [48] and [49] Brazil. In some cases, COVID-19 lock-downs with reduced traffic showed to reduce air pollution while in others, it did not make significant impact due to meteorological conditions and factors such as industrial pollutants [51, 49] . Although machine learning has been used for forecasting air-quality, there is scope in improving the forecasts in using latest machine learning models, that feature deep learning models such as recurrent neural networks. Given this motivation, we focus on air-quality index of Delhi which lately reached hazardous levels. We note that the air-quality in Delhi has significantly improved post COVID-19 pandemic [52] , however this was due to lock-downs by the government and the air-quality can deteriorate further when the restrictions are eased. Hence it is important to develop robust forecasting models, that are applicable even during lock-downs and eras of lock-downs during the COVID-19 pandemic. In this paper, use novel deep learning methods for long-term air-quality prediction in selected parts of Delhi, India. Our deep learning methods comprise of Long Short Term Memory (LSTMs) networks, bidirectional-LSTMs and encoder-decoder LSTMs. We use a multivariate time series approach that attempts to predict air quality for 10 prediction horizons covering a total of 80 hours with 8 hours for each prediction horizon. We also provide a long-term (one month ahead) forecast with uncertainty quantification given feedback of predicted values into the model. We first show visualisation of the air quality indicators before and during coronavirus (COVID) restrictions in Delhi. We investigate the impact of COVID-19 on air-quality and the ability of the model to provide quality predictions before and during COVID-19. Our models feature data that considers major seasons, effect of COVID-19, and considers multivariate and univariate approach for predictions. The rest of the paper is organised as follows. Section 2 presents a background and literature review of related work. Section 3 presents the proposed methodology and Section 4 presents experiments and results. Section 5 provides a discussion and Section 6 concludes the paper with discussion of future work. There have been studies relating the effect of environment and air pollution on COVID-19 and vice versa [48, 49, 49] . Zhu et al. [50] found that there was a significant relationship between air pollution and COVID-19 infection in China. Kerimray et al. [48] presented an assessment on the impact of COVID-19 in large cities in Kazakhstan and found that the temporal reduction in pollution may not be directly attributed to the lockdown due to favorable meteorological variations during the period, however the spatial effects of the lockdown on the pollution levels were clear. Moreover, other non-traffic based sources such as coal based power plants substantially contributed to the pollution level. Dantas et al. [49] presented a study on the impact of COVID-19 partial lockdown on the air quality of the city of Rio de Janeiro, Brazil. The authors reported that The carbon-dioxide levels showed the most significant reductions during the partial lockdown while nitrogen-oxide decreased in a lower extent, due to industrial and diesel input. The air quality index (with PM-10 concentration) was only reduced during the first partial lockout week. Bao and Zhang [53] reviewed 44 cities in China to find if COVID-19 lockdowns had an effect on air pollution. The authors showed that the lockdowns of 44 cities reduced human movements by 69.85 %, and the reduction in the air quality index was mediated by human mobility. Li et al. [54] presented a study on air quality changes during the COVID-19 lockdown over the Yangtze River Delta region in Northern China where the authors showed that the ozone did not show any reduction. Moreover, even during the lockdown it was evident that background and residual pollution are still high, which includes sources mostly from the industry. Dutheil and Naval [55] investigated COVID-19 as a factor influencing air pollution for China. The authors argued that COVID-19 pandemic might paradoxically have decreased the total number of deaths during the period by drastically decreasing the number of fatalities due to air pollution, apart from positive benefits in reducing preventable non communicable diseases. Wang et al. [51] showed that in case of China, severe air pollution events were not avoided by reduced activities during COVID-19 outbreak due to adverse meteorological events. The authors highlight that large emissions reduction in transportation and slight reduction in industrial would not help avoid severe air pollution especially when meteorology is unfavorable. There have been studies if there is another correlation with rise in COVID-19 infections to the weather change. Tosepu et al. [56] presented a study regarding the correlation between weather and COVID-19 pandemic in Jakarta, Indonesia taking into account different temperature levels and percentage of humidity, and amount of rainfall. The authors reported that only the average temperature was significantly correlated with COVID-19 pandemic. Ma et al. [57] presented a study on the effects of temperature variation and humidity on the death of COVID-19 in Wuhan, China. The authors collected the daily death numbers of COVID-19 and meteorological parameters and air pollutant data and used generalized additive model to explore the effect of temperature, humidity and diurnal temperature range on the daily death counts of COVID-19. The authors reported that the temperature variation and humidity may be important factors affecting the COVID-19 mortality. We review machine learning methods used for air quality forecasting in the past decades. Machine learning methods have achieved tremendous success in a variety of areas for air quality forecasting [24, 25, 26, 27, 28] . Although neural networks have advantages over traditional statistical methods in air quality forecasting, they have room for improvements [58] due to challenges that include computational expense, sub-optimal convergence, over-fitting, and noisy data. Moreover, the challenge is in configuration of the network topology and model parameters which affects the prediction performance. Corani [59] used models to predict hourly PM-10 concentrations on the basis of data from the previous day with feedforward and pruned neural networks. Jiang et al. [? ] explored multiple models that feature physical and chemical model, regression model, and multiple layer perceptron on the air pollutant prediction task, and their results show that statistical models are competitive with the classical physical and chemical models. Machine learning is an active field of research and every now and then novel tools and techniques emerge for more refined modeling of a specific problem. Some recent works use straightforward approaches like box models, Gaussian models and linear statistical models, which are easy to implement and allow for the rapid calculation of forecasts [60] . Fu et al. [61] applied a rolling mechanism and gray model to improve traditional neural network models. Chang et al. [62] used aggregated LSTM networks and compared the results with support vector regression, and gradient boosted tree regression. Karimian et al. [23] implemented three machine learning methods to forecast air quality given by PM-2.5 concentrations over different time intervals; these include multiple additive regression trees, deep feedforward neural network, and a hybrid model based on LSTM networks, where the LSTM model was most effective for forecasting and controlling air pollution. Xiao et al. [63] parameterised non-intrusive reduced order model based on proper orthogonal decomposition for model reduction of pollutant transport equations to provide rapid response urban air pollution predictions and controls. Zhu et al. [64] focused on refined modeling for predicting hourly air pollutant concentrations on the basis of historical meteorological and air pollution data. Data-driven machine learning methods present an opportunity to simultaneously assess the impact of multiple air pollutants on health outcomes. There is growing evidence that early-life exposure to ambient air pollution affect neuro development in children [65] . A study that showed that air pollution is a potential risk factor for obesity with higher bodymass index in adults that warrants further investigation about other health effects [66] . Assessing schoolchildren's exposure to air pollution during the daily commute in a systematic review highlighted studies with schoolchildren's exposure during commutes that are linked with adverse cognitive outcomes and severe wheeze in asthmatic children [67] . Furthermore, Ambient air pollution associated with reduction in lung functionality and other respiratory conditions among children [68] . Apart from health, air pollution has taken a toll on various other sectors, which includes agriculture. Industrial air pollution has a drastic effect in agricultural production as shown by a study in China with lower marginal products and further alters the relationships of labour-capital and other factors [69] . Finally, air pollution has a drastic effect on development and economy. A study on the relationship between air pollution and stock returns further showed that industrial air pollution significantly reduces the technical efficiency of agricultural production [70] . According to the United Nations 2016 report, Delhi had an estimated population of 26 million in greater metro area which is projected to rise to 36 million by 2030 and will remain second most populated city after Tokyo [71] . The current population density is also one of the worlds largest that will continue to pose further challenges to air quality and health. According to a study by Indian Ministry of Earth Sciences in 2018, it was shown that number of vehicles increases by four times since 2000 which has been a major factor for air pollution in India which includes PM2.5 and hazardous nitrogen-oxide [72]. In the past five years, the air quality index of Delhi has been generally moderate level between January to September. The air quality index then drastically deteriorates to very poor (301-400), and further Severe (401-500) or Hazardous (500+) levels during October to December due to various factors [73, 74] . The air pollution status in Delhi has undergone many changes in terms of the levels of pollutants and the control measures taken to reduce them. Sulian et al. [75] provides an evidence-based insight into the status of air pollution in Delhi and its effects on health and control measures instituted. The meteorological conditions, such as regional and synoptic meteorology are significant in determining the air pollutant concentrations [76, 77, 78, 79] . Lower wind speed (weak dispersion/ventilation) can result in higher concentrations of traffic pollutants [80] . However, strong wind speed might form dust storms and end up blowing the particles on the ground. [81] . Higher humidity levels are associated with higher aggregates of air pollutants like PM2.5, carbon monoxide (CO), nitrogen dioxide (NO2) and sulfur dioxide (SO2) [82] . In addition, high levels of humidity often indicate precipitation events which result in heavy wet deposition leading to the lowering of air pollutants [83] . The most important factors of attenuated visibility are the interactions of particle compositions and light [84? ]. Thus, low visibility can be considered as a strong indicator of high PM2.5 concentrations. Many a time, the formation of some significant air pollutants like ozone (O3) is reduced by cloud cover as they absorb and scatter solar radiations [85, 86] . Therefore, these meteorological variables are importantly selected to predict air quality and hence we take them into consideration. We chose four most polluted areas in Delhi and several surrounding districts known as the National Capital Region (NCR) for our study. Bawana was the most-polluted area with an air quality index of 497, followed by DTU-Delhi Technological University (487), Anand Vihar (484) and Vivek Vihar (482) as shown in Figure 3 . There are 12 parameters that have been considered for our study, the data is taken from the Central Pollution Control Board, India [87] Recurrent Neural Networks (RNNs) are a class of artificial neural networks which gained a lot of popularity in recent years. The Elman RNN [31, 88] is one of the earliest architectures and considered a prominent example of simple recurrent networks trained by back-propagation through-time (BPTT) algorithm [32] . In the early days, there was much research for modeling dynamic systems using simple RNNs and it was shown that RNNs perform better than feed-forward networks in knowledge representation tasks [89, 90, 91] . Backpropagation through time (BPTT) [32] features error backpropagation which uses the idea of training RNNs using gradient descent in a way similar to feedforward neural networks. The major difference is that the error is back-propagated for a deeper network architecture that features states defined by time; however, BPTT experiences the problems of vanishing and exploding gradients in case of learning tasks which involve long-term dependencies or deep network architectures. [92] . Long Short Term Memory(LSTM) network [33] was developed which was capable of overcoming the fundamental problems of deep learning [30] . LSTM networks addressed the issue with much better capabilities in remembering the long-term dependencies using memory cells and gates for temporal sequences. The memory cells are trained in a supervised fashion using an adaptation of the BPTT algorithm that considers the respective gates [33] . More recently, Adam optimiser which features adaptive learning rate has been prominent in training LSTM models via BPTT [93] . One shortcoming of conventional RNNs is that they are only able to make use of previous context in a sequence to predict future states. Bidirectional RNNs [94] do this by processing the data in both directions with two separate hidden layers, which are then feed forwards to the same output layer. Bidirectional RNNs are simply two independent RNNs together in a structure that allows the networks to have both backward and forward information about the sequence at every time step. The combination of bidirectional RNNs with LSTM model gives us bidirectional LSTM model (BD-LSTM), which have the ability to access long-range context in both input directions [95] . BD-LSTM are designed for specific input sequences whose starting and ending are known beforehand. They take both the future and past states of each element of a sequence into consideration where one LSTM processes the information from start to end of the sequence and the other from end to start. Their combined outputs predict the corresponding labels if available at each time step [30] . This approach differs from unidirectional approach in a way where information from the future and past LSTM model states combined; hence, it preserves information from both past and future at each time step. BD-LSTM have been prominent in sequence processing problems such as phoneme classification [95] , continuous speech recognition [96] , speech synthesis [97] and sentence classification [98] . A sequence to sequence model lies behind numerous systems which we face on a daily basis [99] , and such models aim to map a fixed-length input with a fixed-length output, where the length of the input and output may differ. In multi-step and multivariate analysis, both the input and output are variable with potentially different lengths. This problem is analogous to machine translation between natural languages, where a sequence of words in the input language is translated to a sequence of words in the output language. Recently, it has been shown how to effectively address sequence to sequence problem with encoder-decoder LSTM networks (ED-LSTM). ED-LSTM handles variable length inputs and outputs by first encoding a given input sequence X ip = (x 1 , x 2 , ..x n ), one at a time using a latent vector representation for input of length n and the output is a sequence Y = (y 1 , y 2 , ..y m ) of length m. In the encoding stage, ED-LSTM creates a sequence of hidden states for input sequence and then in decoding stage it defines a distribution for output states given the input sequence. The main goal of the model is to maximize the conditional probability P(Y|X ip ) of mapping input sequence to output sequence while training. ED-LSTM have been previously used in a lot of tasks, some of which include text simplification [100] , automatic speech recognition [101] and grapheme to phoneme conversion [102] . The air-pollution dataset for four different monitoring stations has been collected from Central Pollution Control Board (CPCB) [87] , Government of India 1 . Figure 1 shows the 1 https://cpcb.nic.in/ framework diagram for our experimental setup. We pre-process the dataset extracted from four different monitoring stations in Delhi (Figure 3 with map adopted from website 2 ) for our proposed deep learning framework as shown in Figure 1 . After collecting the data, we fill the missing values of a particular feature by the median of its available neighboring values. The missing values may be due to inconsistency in recording the pollutant values at different monitoring stations. In order to train LSTM models, we transform the two-dimensional (2D) dataset of features and temporal sequences given by timeinterval (eight hours) into a into a three-dimensional (3D) format [103] by a sliding window technique in which a window of size N moves over the original 2D dataset of size (r,c); where r represents the number of rows(features) and c represents the number of columns in dataframe. This results in M windows final dataset with dimensions (M,N,r) which is fed into the models to give N out multi-step ahead prediction values as shown in Figure 2 . We break the dataset into a feature set X i = (x 1 , x 2 , x 3 ....x r−1 ), i = 1...N for multivariate model and X i = (x r ) for univariate model and labels Y j = (y 1 , y 2 , ..., y N out ), j = 1...M as multi-step ahead predictions. The label value consists of PM 2.5 value and other meteorological values constitutes the feature vector for multivariate model while for univariate model past PM2.5 values constitute the feature vector . Our framework for the respective LSTM models is as follows. We use N f = 11 features (different pollutant values other than PM2.5 as shown in Table 1 ) for our multivariate model where we consider N = 5 a window that captures the number of time steps in the past. The window determines how much the LSTM model will unfold in time and the number of features determines the number of input neurons in the respective LSTM models. We define N out = 10 as the number of output neurons in the LSTM model which denotes the number of steps ahead to predict PM2.5 concentration. We note that each step ahead is denoted by 8 hour intervals and hence the respective models would predict 80 hours ahead. Hence, we use 40 hours (5 steps) of past information to predict 80 hours (10 steps) of future trend of the time series. In the case of univariate model, we only use PM2.5 as a feature to predict its future trend and hence, the respective LSTM models will feature 1 input neuron and 10 output neurons. In our experiments, we compare the performance of three LSTM models as presented earlier for univariate and multivariate prediction models. We first analyse the impact of COVID-19 lock-downs on the air quality and then evaluate the different models for multi-step ahead prediction of air quality (PM2.5). Our experiments consider different data setup that features data before and after COVID-19 in order to evaluate the effect on air quality and model accuracy. The visualisations and experiments are organised as follows. • Provide analysis of concentration of PM2.5 for different monitoring stations and for different time intervals including the COVID-19 lockdown; • Compare the performance of the three LSTM network models using Adam optimizer; • Create training data with different strategies, that includes and exclude COVID-19 time-span and compare the performance of best LSTM model; • Compare univariate and multivariate approach using the best LSTM model; • Provide one month forecast using best LSTM model with uncertainty quantification with multiple experimental runs. The technical details of our framework for the respective LSTM models as follows. We considered different number of hidden neurons and other hyper-parameters (such as learning rate) in our trial experiments. We define the topology of different models in terms of input, hidden layers and output in Table 5 . We use Adam optimizer [93] with batch size of 20 for 200 epochs and rectifier linear unit (ReLu) activation in the hidden layers for the three LSTM models. We also compare our results with FNN which uses Adam optimiser. In the respective experiments, we compare the performance of different models for different time steps by using the train data from Jan 2019 to May 2020 (pre-COVID-19 lockdown) and the test data from June 2020 to Dec 2020 (during and partial COVID-19 lockdown). We test our models by training without shuffling the time-series data window and also by shuffling it. Shuffling the data here refers to randomly picking different training data windows (regions) comprising of past time steps and its corresponding output time steps. Hence, with shuffling, the training data would feature peak and off-peak regions of COVID-19 as different weeks can be featured in training data rather than taking consecutive weeks. Furthermore, we test the best model in two different ways, first by training it on a seasonal data comprising of observations from February to September in 2019 which excludes the COVID-19 lockdown. We test the model on respective months for 2020 which corresponds to the COVID-19 lockdown and partial lockdown periods, in this case we use 50 epoch of training with our best LSTM model. Second, we use a univariate time-series approach for multi-step ahead prediction of PM2.5 values and test the performance using our best model. In this case, we use 1000 epochs of training with our best LSTM model. In all our experimental setup using different LSTM models and FNN, we further split the test dataset into validation dataset for tuning the model during trial experiments and test our model on the remaining portion for evaluating the models, we split the test dataset in the ratio 1:1 for for all our respective experiments. The prediction performance is measured by root mean squared error (RMSE) as follows where, y i andŷ i are the actual value and the predicted value, respectively. N refers to the total length of the data. We report the RMSE for different horizons and also the mean RMSE for all prediction horizons. We conduct 30 independent experimental runs with different weight initialisation in the respective LSTM models and provide mean and standard deviation in all our results. We first provide data visualisation and analyse the effect of COVID-19 on air pollution in Delhi while comparing with previous years. Figure 5 presents PM2.5 values of stations in Anand Vihar, Bawana, Delhi Technical University and Vivek Vihar from 1 January 2018 to 10 December 2020 that is used as dataset for this study. We find some missing data in the initial months of 2018 for some stations therefore we consider the data from 1 January 2019 to 10 December 2020 for building our respective models and strategies. Table 6 presents a summary of the visualisation in Figure 5 where the seasons are quantitatively highlighted. We observe higher values of PM2.5 value for the months from October till February on a seasonal basis after which the value starts to descend starting from the month of March. Although we find lower values of PM2.5 during March-June every year, we observe a significant decrease for months of interest (March-June 2020), which belongs to the COVID-19 full lockdown period in India when compared to the respective months in 2018 and 2019as shown in Table 6 and Figure 5 . We find a similar decrease of PM2.5 values for all the monitoring stations while Bawana records the highest values for 2019 and 2020 among other stations. The partial lockdown period include the months after June, 2020 as highlighted in Figure 5 . more independent feature sets. Higher positive correlation values are represented as dark brown panels while higher negative correlation values are represented as dark blue panels in different correlation heatmaps. We observe higher correlation values of NOx with NO and NO2 also which is expected because they are nitrogenous pollutants and show higher positive dependence. We also observe high positive correlation values between PM2.5 and PM10 for all the monitoring stations which also indicates a good amount of correlation. We report the train and test performance (RMSE) mean and (±) 95% interval of the RMSE for different prediction horizons for each model from 30 experimental runs. Table 6 shows the mean and (±) 95% confidence interval for PM2.5 concentration for different monitoring stations for the months March to June in the years 2018, 2019 and 2020. We selected this particular interval of months to show a comparison of PM2.5 concentration in consecutive years for these COVID-19 lockdown months. We observe a significant decrease in PM2.5 concentration in 2020 for all monitoring stations as compared to 2019 as a consequence of COVID-19 lockdown. In our first set of experiments, we compare the performance of different multivariate LSTM models (LSTM, ED-LSTM and BD-LSTM) where all the models are trained using Adam Optimizer using the dataset collected from the monitoring station at Anand Vihar. We also show results with canonical feedforward neural network (FNN-Adam) with Adam optimiser. We also show results for case of shuffling the training data (BD-LSTM*), univariate (UBD-LSTM) and seasonal (SBD-LSTM) at Anand Vihar. Note that all LSTM models are multivariate except for UBD-LSTM. Figure 6 (a) shows the prediction horizon RMSE for 10 step ahead prediction, and Figure 6 (b) shows the mean RMSE for entire train and test datasets for Anand Vihar using different LSTM models. We show bar plots with mean and (±) 95% confidence interval using 30 experimental runs. Note that lower value of RMSE indicates better performance. We find that the prediction RMSE in general increases across prediction horizons for all the models which is also natural as we use a specified window size as input to predict the multi-step ahead values and the results deteriorate as the gap in the missing values increases with increasing prediction horizon. In Figure 6 (b) and Table 7 , we find that the multivariate BD-LSTM performs the best on train dataset generalizes well with test performance. We find that ED-LSTM is the worst performer in this case with highest RMSE on both train and test dataset in comparison to other models. In Figure 6 (a) and Table 7 , we observe that multivariate BD-LSTM generally has the best performance which significantly improved for step sizes 6-10 when compared to other models. Since all the monitoring stations belong to Delhi region, we use the model which performs the best on dataset from Anand Vihar to analyse the performance on the dataset from other stations namely, Bawana, DTU and Vivek Vihar. Taking into account the results for Anand Vihar, we consider BD-LSTM as our best model, and further apply it other monitoring stations. We also consider different strategies with BD-LSTM to train and analyse the performance on dataset from different monitoring stations which includes shuffling the input window when training BD-LSTM model (BD-LSTM*), univariate time-series (a) RMSE for different prediction horizons for Anand Vihar. analysis using BD-LSTM model (UBD-LSTM) and training the BD-LSTM model using seasonal data (SBD-LSTM). In Figure 7 (a) and Table 7 results for Anand Vihar, we find that BD-LSTM performs better on train dataset in comparison to BD-LSTM* with almost similar performance on test dataset while UBD-LSTM has highest RMSE on both train and test dataset with highest confidence interval(error) of prediction. In Figure 7 (b) and Table 8 results for Bawana, we find that BD-LSTM performs the best on train dataset in comparison to BD-LSTM* and UBD-LSTM while BD-LSTM* performs the best on test dataset amongst them. We find a similar trend as Anand Vihar for UBD-LSTM with highest RMSE on both train and test dataset and highest confidence interval of prediction. In Figure 7 (c) and Table 9 results for DTU, we find that UBD-LSTM performs better than BD-LSTM and BD-LSTM* on train dataset while BD-LSTM* performs the best on test dataset among these three, we observe a similar trend for UBD-LSTM with highest confidence of prediction on both train and test dataset. In Figure 7 (d) and Table 10 results for Vivek Vihar, we find that BD-LSTM* performs the best on train dataset in comparsion to BD-LSTM and UBD-LSTM, while BD-LSTM performs the best on test dataset among them. UBD-LSTM has the worst performance on both train and test dataset with largest confidence interval for the predictions. In the seasonal model for the monitoring stations (SBD-LSTM), we find least RMSE for both train and test dataset with better performance on test dataset for Anand Vihar, Bawana and DTU. For Vivek Vihar, we find that SBD-LSTM performs the best on train dataset but it does not generalise well on test dataset. We next evaluate the performance of SBD-LSTM in the same plot for other models (BD-LSTM, BD-LSTM*,UBD-LSTM). We cannot make a direct comparison between SBD-LSTM and other models in terms of prediction accuracy, since SBD-LSTM uses a subset of initial data (February-September, 2019) with lower range of PM2.5 values for these months as shown in Figure 5 . Figure 8 provides a long-term (one month ahead) forecast (11th December 2020 to 9th January 2021) of PM2.5 concentration values for different monitoring stations using UBD-LSTM model. The plots provide mean and (±) 95% confidence interval as uncertainty in predicted values for 30 experimental runs for 8 hours interval for each day, covering a total of 720 hours for the entire month. Figure 9 shows a comparison between actual and predicted values of PM2.5 concentration for September 2020 in terms of mean and (±) 95% confidence interval in predicted values for 30 experimental runs for different monitoring stations using SBD-LSTM model. In the respective experiments, we evaluated the performance of different models on the time-series problem and studied the effect of COVID-19 lockdown on prediction performance of different models for different seasons. We find that BD-LSTM is our best performer in terms of both train and test performance with better generalization when compared to other LSTM based models for Anand Vihar. This could be due to the BD-LSTM architecture which features both forward and backward information for a sequence at each time-step using two LSTM layers [95] . We find that BD-LSTM performs the best for larger time steps (6-10) with better generalization as compared to other models which indicates that BD-LSTM captures better information for long-term ahead prediction. This could be due to the combination of memory cells with two LSTM layers which are able to capture salient features in a temporal sequence when compared to canonical LSTM models. Although BD-LSTM based models have been mainly used for different language modelling, we find them promising for long-term time-series prediction problems. In the respective experiments, we also found that BD-LSTM and BD-LSTM* perform well in predicting future PM2.5 values for different monitoring stations which indicates that they are not affected much due to COVID-19 lockdown phase. UBD-LSTM has the highest RMSE and confidence interval for test datasets which indicates that it is not robust. The poor performance which could be due to the fact that UBD-LSTM uses only one features to predict future steps in contrast to multivari-ate models which uses a set of features as input that helps with additional information for predicting future steps. The seasonal model (SBD-LSTM) performs slightly better on test dataset for Anand Vihar, Bawana and DTU along with good generalization performance which indicates that the model is robust to the effects of COVID-19 lockdown period. It is important to evaluate the effect on performance of different models due to seasonal decrease of PM2.5 values during COVID-19 lockdown period when compared to previous years. We find that different LSTM models with different training strategies are quite robust in modeling the effects of COVOD-19 lockdown for both during and partial lockdown periods. Recent focus has been on studying effects of the lock downs imposed in different cities of India on the spatial patterns of air quality both during pre-lockdown and during-lockdown phases. The results have demonstrated a direct implication towards the decrease in the concentration of selected pollutants such as PM2.5 and PM10, with maximum decrease(more than 50%) in comparison to the pre-lockdown phase [104] . There has also been a significant decrease in the concentration of Nitro- gen Dioxide(NO2) which has been a major pollutant in reputed cities of India such as Delhi and Mumbai [105] . There is a positive correlation between PM2.5 concentration and COVID-19 for countries such as India and Pakistan [106] . Apart from the major air pollutants, there has been a significant decrease in aerosol optical depth (AOD) and lightning activities in many urban and mining regions of India [107, 108] . There is a large effort by the Indian government to move towards clean energy with a massive investment in solar and wind energy which can have benefits [109, 110] . Moreover, India also has plans to go fully electric in the transportation sector by 2030 [111] . Currently, India has began manufacturing electric vehicles with plans to have fully electric rail network by 2024 [112] . Renewable energy is expected to rise from 27 % of total energy demand in 2014 to around 43 % by 2040 [113] . In many developing countries, it is clear that an increase of population density increases the air pollution; however, this can change with type of energy used. In a recent study, in the case of China, it has been shown that the increase in population density will reduces air pollution which has been due to clean energy and public transportation [114] . A study on the effects of meteorological conditions and air pollution on COVID-19 transmission from 219 Chinese cities found that air pollution indicators were positively correlated with new confirmed cases and increase in cases were linked with air-quality index [115] . Such study in case of India would be needed and prediction of air-quality indicators could help in preventative measures. In future work, Bayesian deep learning methods can be used to provide robust uncertainty quantification in predictions which can extend Bayesian neural networks used for time series prediction [116] . Other learning strategies such as multi-task and transfer learning in conjunction with Bayesian inference can be used to develop improved models [117, 118] which can take into account existing deep learning models for COVID-19 infections in India [119] . Moreover, we envision the application of the proposed framework in other parts of India and rest of the world which has decline in air quality. A web-based framework can be implemented by the respective authorities that can be used to provide proper weekly and monthly planning, which could involve the way traffic is managed for different seasons. The methodology can also be extended for air quality prediction in relation to forest fires around the world. In this paper, we applied deep learning via different LSTM models for univariate and multivariate modelling for short-term and long term air quality prediction taking into account the four base-stations from Delhi, India. Although we selected a subset of the base stations, the methodology can be applied to rest of the base stations in Delhi or in other parts of the world. Our results show that multivariate bi-directional LSTM model shows best performance, and rest of the LSTM models have certain strengths and limitations which need to be evaluated prior to developing a system that provides rigours uncertainty quantification in predictions. We also found that COVID-19 had a significant effect on the air quality during full lockdown implemented for few months and afterwards, there was unprecedented growth of poor air quality which has a seasonal effect as compared to previous years. We provide open source software framework and open data which can be used for further verification and also application for studying or developing prediction models in other places. We provide Python based open source implementation along with the data for different experiments and design of respective method for further research 3 . Concentration prediction for next one month for DTU. (d) PM2.5 Concentration prediction for next one month for Vivek Vihar Concentration prediction with Mean and (±) 95% confidence interval as error for next one month using UBD-LSTM model for different monitoring stations World city network: a global urban analysis. Routledge Air pollution levels rising in many of the world's poorest cities Status of fuel quality and vehicle emission standards latin america Climate change and health in cities: impacts of heat and air pollution and potential co-benefits from mitigation and adaptation Satellite remote sensing of particulate matter and air quality assessment over global cities Fine particulate matter (pm 2.5) in china at a city level Transparent air filter for high-efficiency pm 2.5 capture Association between pm 2.5 and all-cause and specific-cause mortality in 27 us communities The contribution of outdoor air pollution sources to premature mortality on a global scale Unique identification authority of India 6 of the world's 10 most polluted cities are in India Ambient pm2. 5 reduces global and regional life expectancy Quantifying the influence of agricultural fires in northwest india on urban air pollution in delhi, india Assessment and system analysis of industrial waste management Spatiotemporal deep learning Concentration(actual and predicted) for Anand Vihar. (b) PM2.5 Concentration(actual and predicted) for Bawana Concentration(actual and predicted) for DTU. (d) PM2.5 Concentration(actual and predicted) for Vivek Vihar Concentration (actual and predicted) with Mean and (±) 95% confidence interval as error of the predicted values for September, 2020 using SBD-LSTM model for different monitoring stations. model for citywide air pollution interpolation and prediction Deep learning architecture for air quality predictions A deep learning approach for forecasting air pollution in south korea using lstm Fully coupled "online" chemistry in the wrf model Description and evaluation of the model for ozone and related chemical tracers, version 4 (mozart-4) An improved method for monitoring fine particulate matter mass concentrations via satellite remote sensing Predicting traffic accidents through heterogeneous urban data : A case study A unified architecture for natural language processing: Deep neural networks with multitask learning Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation Leveraging sequence classification by taxonomy-based multitask learning Multitask learning and the reorganization of work: from tayloristic to holistic organization Analytic complexity and challenges in identifying mixtures of exposures associated with phenotypes in the exposome era Deep learning in neural networks: An overview Learning the hidden structure of speech Backpropagation through time: what it does and how to do it Long short-term memory Competition and collaboration in cooperative coevolution of Elman recurrent neural networks for time-series prediction The vanishing gradient problem during learning recurrent neural nets and problem solutions Learning long-term dependencies with gradient descent is difficult Empirical evaluation of gated recurrent neural networks on sequence modeling Learning phrase representations using rnn encoder-decoder for statistical machine translation Bidirectional recurrent neural networks 2005 special issue: Framewise phoneme classification with bidirectional lstm and other neural network architectures The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-cov-2 Inhibition of sars-cov-2 infections in engineered human tissues using clinical-grade soluble human ace2 Coronavirus disease 2019 ( COVID-19): situation report WHO declares COVID-19 a pandemic What will be the economic impact of COVID-19 in the US? rough estimates of disease scenarios Economic effects of coronavirus outbreak (covid-19) on the world economy The potential impact of COVID-19 on gdp and trade: A preliminary assessment Assessing air quality changes in large cities during COVID-19 lockdowns: The impacts of traffic-free urban conditions in almaty, kazakhstan The impact of COVID-19 partial lockdown on the air quality of the city of rio de janeiro, brazil Association between short-term exposure to air pollution and COVID-19 infection: Evidence from china Severe air pollution events not avoided by reduced anthropogenic activities during COVID-19 outbreak India coronavirus: Can the covid-19 lockdown spark a clean air movement? Does lockdown reduce air pollution? evidence from 44 cities in northern China Air quality changes during the COVID-19 lockdown over the yangtze river delta region: An insight into the impact of human activity pattern changes on air pollution variation COVID-19 as a factor influencing air pollution Correlation between weather and COVID-19 pandemic in jakarta, indonesia Effects of temperature variation and humidity on the death of COVID-19 in Wuhan, China Three improved neural network models for air quality forecasting Air quality prediction in milan: Feed-forward neural networks, pruned neural networks and lazy learning Air quality prediction by machine learning methods Prediction of particular matter concentrations by developed feed-forward neural network with rolling mechanism and gray model An lstm-based aggregated model for air pollution forecasting Machine learning-based rapid response tools for regional air pollution modelling A machine learning approach for air quality prediction: Model regularization and optimization Prenatal exposure to pm10 and no2 and children's neurodevelopment from birth to 24 months of age: Mothers and children's environmental health (moceh) study Comprehensive identification and isolation policies have effectively suppressed the spread of COVID-19 Assessing schoolchildren's exposure to air pollution during the daily commute -a systematic review The role of influenza vaccination in mitigating the adverse impact of ambient air pollution on lung function in children: New insights from the seven northeastern cities study in china Effects of industrial air pollution on the technical efficiency of agricultural production: Evidence from china Uncovering the invisible effect of air pollution on stock returns: A moderation and mediation analysis Usual suspects: Vehicles, industrial emissions behind foul play Delhi breathed easier from january to april Air pollution: Delhi enjoys cleanest february in three years Air pollution in delhi: Its magnitude and effects on health A synoptic climatology of rural ozone pollution at three forest sites in pennsylvania Temporal, spatial and meteorological variations in hourly pm 2.5 concentration extremes in new york city An Automated Classification Scheme Designed to Better Elucidate the Dependence of Ozone on Meteorology An analysis of the meteorological parameters affecting ambient concentrations of acid aerosols in Atmospheric transport and dispersion of air pollutants associated with vehicular emissions Analysis of dust storms observed in mongolia during 1937-1999 Contrasted effects of relative humidity and precipitation on urban pm2. 5 pollution in high elevation urban areas Atmospheric chemistry and physics: From air pollution to climate change Visibility as related to atmospheric aerosol constituents The Influence of Pollution on the Shortwave Albedo of Clouds Shade trees reduce building energy use and co2 emissions from power plants Central Pollution Control Board, CCR Finding structure in time Fuzzy finite state automata can be deterministically encoded into recurrent neural networks Training second-order recurrent neural networks using hints Equivalence in knowledge representation: Automata, recurrent neural networks, and dynamical fuzzy systems The vanishing gradient problem during learning recurrent neural nets and problem solutions Adam: A method for stochastic optimization Bidirectional recurrent neural networks Framewise phoneme classification with bidirectional lstm and other neural network architectures Tts synthesis with bidirectional lstm based recurrent neural networks Hybrid speech recognition with deep bidirectional lstm Densely connected bidirectional lstm with applications to sentence classification Sequence to sequence learning with neural networks An experimental study of lstm encoder-decoder model for text simplification A comparison of transformer and lstm encoder decoder models for asr Sequence-to-sequence neural net models for grapheme-to-phoneme conversion Multi-step Time Series Forecasting of Electric Load Using Machine Learning Models Effect of lockdown amid covid-19 pandemic on air quality of the megacity delhi, india The impact of covid-19 as a necessary evil on air pollution in india during the lockdown Can pm2. 5 pollution worsen the death rate due to covid-19 in india and pakistan? Effect of lockdown due to sars covid-19 on aerosol optical depth (aod) over urban and mining regions in india Significant decrease of lightning activities during covid-19 lockdown period over kolkata megacity in india Solar rooftop in india: Policies, challenges and outlook Wind energy development and policy in india: A review A review of electric vehicle lifecycle emissions and policy recommendations to increase ev penetration in india Sustainable development and carbon neutrality: Integrated assessment of transport transitions in india Green energy finance in india: Challenges and solutions The influence of increased population density in china on air pollution Effects of meteorological conditions and air pollution on covid-19 transmission: Evidence from 219 chinese cities Langevin-gradient parallel tempering for Bayesian neural learning Bayesian neural multi-source transfer learning Co-evolutionary multi-task learning with predictive recurrence for multi-step chaotic time series prediction Deep learning via LSTM models for COVID-19 infection forecasting in india