key: cord-0798065-1fhjtvwr authors: Shad, Mohammad; Sharma, Y. D.; Singh, Abhishek title: Forecasting of monthly relative humidity in Delhi, India, using SARIMA and ANN models date: 2022-04-11 journal: Model Earth Syst Environ DOI: 10.1007/s40808-022-01385-8 sha: 0d28ad526a9fbc28bbb77517064f2394e80203f0 doc_id: 798065 cord_uid: 1fhjtvwr Relative humidity plays an important role in climate change and global warming, making it a research area of greater concern in recent decades. The present study attempted to implement seasonal autoregressive moving average (SARIMA) and artificial neural network (ANN) with multilayer perceptron (MLP) models to forecast the monthly relative humidity in Delhi, India during 2017–2025. The average monthly relative humidity data for the period 2000–2016 have been used to carry out the objectives of the proposed study. The forecast trend in relative humidity declines from 2017 to 2025. The accuracy of the models has been measured using root mean squared error (RMSE) and mean absolute error (MAE). The results showed that the SARIMA model provides the forecasted relative humidity with RMSE of 6.04 and MAE of 4.56. On the other hand, MLP model reported the forecasted relative humidity with RMSE of 4.65 and MAE of 3.42. This study concluded that the ANN model was more reliable for predicting relative humidity than SARIMA model. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40808-022-01385-8. The consistent rise in global temperatures over the twentyfirst century continues to pose a serious challenge to humanity. The principal factors of climate change are the rising solar energy and global warming due to the aggravation of greenhouse effects (McCarthy et al. 2001; Griggs and Noguer 2002; Asadi and Karami 2021) . Climate change has adversely affected various aspects of human life, such as agriculture, economy, food security, and public health (Dogru et al. 2019; Kogo et al. 2021; Pasquini et al. 2020) . Relative humidity (RH) is the amount of water vapor or vapor pressure in the air. It is measured as a percentage of the amount of moisture that the atmosphere can hold at the same temperature and pressure (Khatibi et al. 2013; Ghadiri et al. 2020) . Relative humidity is an important climate feature that directly influences some of the major sectors such as human health, hydrological studies, pharmaceutical industry, agriculture, irrigation scheduling, floods, and hydropower (Yu 2009; Gunawardhana et al. 2017) . Besides, extreme values of RH ( < 40% or > 60% ) can negatively impact human health including cold and flu, nasal bleeding, vomiting, asthma attacks, allergies (Falagas et al. 2008; Zhang et al. 2016) . The human body is more susceptible to respiratory diseases like COVID-19 infection during low relative humidity conditions (Mangla et al. 2021) . Besides, high relative humidity causes an increase in precipitation, which can be dangerous for the economy of any country (Silveira 2002) . Moreover, accurate weather information and accurate humidity forecasting are frequently essential for warning about natural disasters induced by sudden changes in climatic conditions (Adnan et al. 2021) . The above discussion emphasizes the urgency of monitoring and predicting relative humidity throughout the year in developing countries like India. Many studies employed various time series models, such as autoregressive integrated moving average (ARIMA) models, fuzzy time series (FTS) models, and artificial neural network (ANN) models for the prediction of entities in various fields, such as hydrology, earth sciences, and economics (Nilashi et al. 2012; Singh 2018; Chen et al. 2019; Luk et al. 2001) . Sarraf et al. (2011) implemented ARIMA model to forecast the relative humidity and monthly mean temperature during 2009-2011 of Ahwaz Station in Iran. Arzu et al. (2020) used multiple linear regression (MLR), ARIMA, and ANN models for the prediction of wind speed in Suva, Fiji, and demonstrated that MLR performed better than the ARIMA and ANN models. Li et al. (2019) employed ARIMA and long short-term memory (LSTM) models to forecast the daily average relative humidity in Gansu Linxia, China. This study showed that the ARIMA model provides better accuracy than the LSTM model. Masngut et al. (2020) employed ARIMA and ANN models to forecast the rainfall of Simpang Ampat, Pulau Pinang in Malaysia. The study used the daily rainfall data during the period January 2016 to December 2018 and revealed that the ANN model provides better prediction of rainfall compared to the ARIMA model. Shi et al. (2018) developed a prediction model based on backpropagation neural networks to forecast indoor air temperature and relative humidity every 10 min and 6-72 h in advance in Chongqing, China. The study demonstrated that the proposed model is more effective in predicting temperature. Wanishsakpong and Owusu (2020) implemented ARIMA and ARIMAX models on the average monthly temperature data pertaining to the period 2006 to 2016 in the southwestern region of Thailand and demonstrated that the ARIMA model provided a better forecast of temperature compared to the ARIMAX model. Eymen and Köylü (2019) utilized the Mann-Kendall rank test for trend analysis of relative humidity and wind speed at Yamula Dam, Turkey. Theil-Sen slope method has been used to find out the power of the trend. Further, the ARIMA model has been implemented to forecast the relative humidity. Zhang et al. (2017) proposed a wavelet-ARMA/ARIMA model for forecasting of particulate matter PM 10 and compared its accuracy with the ARMA/ARIMA model. The results revealed that the proposed model outperformed the ARMA/ARIMA model. Casallas et al. (2021) implemented an LSTM model to forecast PM 2.5 and meteorological variables (temperature, radiation, humidity, wind speed) for Bogotá, Colombia. The results revealed that the LSTM performs better, especially for PM 2.5 and wind speed. Based on the daily rainfall and streamflow datasets from 1991 to 2014, Ali and Shahbaz (2020) employed the ANN models to forecast the daily river streamflow of Lahore, Pakistan. The estimated parameters of the proposed ANN model have been assessed based on different criteria such as root mean square error (RMSE), correlation coefficient (R), and the coefficient of determination ( R 2 ). The results showed that the ANN model with a composition of input patterns demonstrated to the model significantly affect the learning ability, training time, and functioning of the ANN model. Tao et al. (2022) developed random forest and multivariate adaptive regression spline models for the prediction of relative humidity in Iraq and validated their performance with support vector regression. Astsatryan et al. (2021) utilized the neural network technique for the prediction of hourly temperature in the Ararat valley, Armenia. The results revealed that the suggested model provided 87.31% and 75.57% accuracy in the prediction of temperature for 3 and 24 h. Many studies have been conducted to forecast the relative humidity in India. Namratha V (2020) employed the ARIMA model for the prediction of relative humidity in Bangalore. Kamath and Kamat (2018) forecasted monthly rainfall for Idukki district, Kerala using ARIMA, ANN, and exponential smoothing state space (ETS) models. They showed that the ARIMA model produced more reliable results compared to ANN and ETS models. Kumar et al. (2021) used the ARIMA and machine learning (ML) algorithms for the prediction of air pollution in Assam. The results revealed that the ARIMA model performed better than the machine learning algorithm. Kulkarni et al. (2018) employed the ARIMA model for the prediction of air pollution in Nanded, Maharashtra. This study showed that the level of air pollution is increasing in Nanded city. Many studies employed SARIMA models to predict the temperature and precipitation in India. These studies showed that the predicted data were proficient with the trend in the observed data (Dimri et al. 2020; Dabral and Tabing 2020) . On the basis of monthly solar insolation data during 1984 , Shadab et al. (2020 implemented a seasonal ARIMA model to forecast solar radiation in Delhi. The proposed seasonal ARIMA model was found to explain the maximum forecasted insolation value in May and the minimum in the months of January and December. Litta et al. (2013) employed an ANN model to predict the temperature and relative humidity in Kolkata during premonsoon thunderstorms in 2009 and examined the utility of ANN for estimating hourly surface temperature and relative humidity. This showed that the ANN model provides a better prediction of hourly temperature and relative humidity during thunderstorm hours. Rajendra et al. (2019) employed artificial neural network models, namely multilayer perceptron (MLP) and radial base function (RBF) to predict the metrological variables of two stations situated in India. The study demonstrated that the MLP and RBF had provided 91-96% accuracy for predictions of metrological variables. Kapadia and Jariwala (2021) developed a model for the prediction of ozone in Surat city using ANN feature selection techniques, namely, sensitivity analysis, Boruta algorithm, and the recursive feature elimination algorithm (RFE). The results revealed that the efficiency of the proposed model was found to be 79.4% . Biswas and Sinha (2021) employed a long short-term memory (LSTM) model and a bidirectional long short-term memory (BiLSTM) model to forecast the Indian Ocean wind speed. The study used daily wind speed data from 2006 to 2017 and demonstrated that the BiLSTM model performs much better than the LSTM model. Ramesh and Iyengar (2016) implemented an artificial neural network (ANN) model on the monthly Indian monsoon rainfall data over the course of the twentieth century. The proposed ANN model was found to explain more than 90% of the underlying variance of the data. Lama et al. (2021) used the SARIMA model in conjunction with the exponential autoregressive (EXPAR) and time-delayed neural network (TDNN) models to predict the changes in the monthly rainfall in the Himalayan region of India. The study demonstrated that TDNN has stronger pattern prediction ability and higher forecast accuracy than the SARIMA and EXPAR models. To the best of our knowledge, not a single study was conducted to compare the performances of the seasonal autoregressive integrated moving average (SARIMA) and artificial neural network (ANN) with MLP models. Thus, the purpose of this study is to examine the SARIMA and ANN with MLP models' forecasting accuracy and pattern prediction ability for predicting monthly relative humidity in Delhi, India during 2017-2025. The monthly average relative humidity data (2000) (2001) (2002) (2003) (2004) (2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) (2013) (2014) (2015) (2016) collected by the India Meteorological Department (IMD), Pune were used to fulfill the objectives of the study. The data consist of the average relative humidity in percentage (%) per month. The SARIMA and ANN with MLP models were used to predict the monthly average relative humidity in Delhi, India. The Box-Jenkins (B-J) methodology (Box et al. 2015) was applied to fit the SARIMA model, it used the stationary stochastic processes to predict the relative humidity in Delhi. On the other hand, a multilayer perceptron algorithm was used to fit the ANN model to predict relative humidity. The SARIMA (p, d, q) × (P, D, Q) S model, where p and q represent the orders of the non-seasonal autoregressive and moving average terms, respectively, d is the order of difference. Similarly, P and Q represent the orders of the seasonal autoregressive and moving average terms, respectively, and D represents the order of seasonal difference. The model can be expressed as where (B) and (B) are the p and q order of non-seasonal autoregressive and moving average polynomials, respectively. S i m i l a r l y , represent the P and Q order of seasonal autoregressive and moving average polynomials, respectively. B is the back-shift operator and defined as the and t denotes the error term which behaves as a white noise process. S represents the seasonal (S=12) frequency. The SARIMA model is used in the Box-Jenkins approach (1976), which contains four-steps to forecast the relative humidity: identification of the model's order, parameters estimation, diagnostic checking, and forecasting. In the first step, we must check the stationarity of time series data in SARIMA (p, d, q) × (P, D, Q) S model. If the data are not stationary, then, in that case, we take the difference of the data to make it stationary. In this study, the augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests were applied to check the stationarity of time series data. In the ADF test, if p value < 0.05, then the given time series considered to be stationary (Said and Dickey 1985) , whereas in the KPSS test, if p value > 0.05, then the given time series data considered to be stationary. To obtain the order of p, q, P, Q, autocorrelation functions (ACF) and partial autocorrelation functions (PACF) were used. Furthermore, Akaike information criterion (AIC) (Akaike 1974) , and diagnostic analysis were used for the selection of the best-fitted model. After identification of the model, the maximum likelihood estimation method was used to estimate the value of parameters of the fitted model. To verify the properties of the residuals of the fitted model to follow the white noise process, the Ljung-Box test was used. The Ljung-Box test examines the null hypothesis that no significant autocorrelation remains in the model's residuals and shows whether the model is accurately specified. A p value > 0.05 indicates that the model is properly constructed to describe time series correlation information (Ljung and Box 1978) . Finally, the fitted models was employed to forecast the average monthly relative humidity in Delhi during 2017-2025. The concept of ANN was first developed by McCulloch (1943) . It is an effective method of information processing that resembles a biological neural network in its characteristics. In the last few decades, ANN has been widely employed for the prediction, pattern recognition, and feature extraction, etc (Khan et al. 2016 ). In the present study, a multilayer perceptron (MLP), which is a class of feed-forward artificial neural network (ANN), was used to forecast the average monthly relative humidity. The MLP is the most widely used ANN technique in modeling hydrological activities (Traore et al. 2010 ). There can be more than one linear layer in an MLP. The input layer is the first layer that collects data. The output layer is the final layer that generates output data. The hidden layers exist between the output and input layers. The MLP model with a hidden layer is represented by I:Hs: Ol, where I denotes the number of nodes in the input layer, H describes the number of nodes in the hidden layer, O is the number of nodes in the output layer, s denotes the logistic sigmoid transfer function and l indicates the linear transfer function. It is presented in Fig. 1 . Figure 1 shows the structure of an MLP for average monthly relative humidity data to create the relationship between input and output layers. Each unit processes it with an activation function in the input and hidden layer and finally transmits it to the output layer. The relationship between input and hidden layer in the case of MLP can be determined as where W i represents the weights and X i is the inputs nodes. Figure 2 shows the time series graph of monthly average relative humidity in Delhi during 2000-2016. The time series graph indicates that the observed data is the stationary time series. Besides, ADF and KPSS tests were applied to confirm the stationarity of the data. In the case of the ADF test, the p-value was found to be less than 0.05. It indicates that the relative humidity data during 2000-2016 is stationary. The results of the KPSS test showed that the p value was greater than 0.05 (Table 1) . Similarly, it also indicates that the relative humidity data during 2000-2016 is stationary. The autocorrelation function (ACF) and the partial autocorrelation function (PACF) are illustrated in Figs. 3 and 4, respectively. In ACF plot, the significant spikes found at (12) Hidden (5) Output lag 1 and lag 12. Thus, this plot suggests that the order of non-seasonal moving average term was (q=1) and the order of seasonal moving average term was (Q=1). Similarly, the PACF plot showed the significant spikes at lag 1 to 4 and at lag 12, which suggested the order of non-seasonal autoregressive term was (p=4) and the seasonal autoregressive term was (P=1). On the basis of the minimum values of AIC and diagnostic analysis, we selected the best-fitted SARIMA(1, 0, 0) × (0, 1, 1) 12 model for predicting the relative humidity. The final model is defined by Eq. 1 and have the following form 0.33(B)∇ 1 12 Y t = −0.86(B 1 ) t . Table 2 presents the estimated parameter values of SARIMA(1, 0, 0) × (0, 1, 1) 12 model. The best-fitted SARIMA(1, 0, 0) × (0, 1, 1) 12 model had a low AIC score 1269.93 and well-behaved residuals are evident from Figs. 5, 6 and 7. The ACF and PACF plots of the residuals highlight the absence of any autocorrelation among the residuals. Further, the Ljung-Box test has also been implemented on the residuals, and the p value was found to be greater than 0.05 (Table 3) . It revealed that the residuals of the fitted model followed the white noise process. Hence, all the residuals plot and Ljung-Box test showed that the SARIMA(1, 0, 0) × (0, 1, 1) 12 model can be used to forecast the relative humidity in Delhi. Thus, SARIMA(1, 0, 0) × (0, 1, 1) 12 model was employed to forecast the monthly relative humidity in Delhi during 2017-2025. The forecasted values were given in the supplementary table (Table S1 ). Besides, Fig. 8 shows the forecasted values of the relative humidity in Delhi during 2017-2025. From Fig. 8 , we observed that the estimated relative humidity will decrease every year from January to May and September to October. On the other hand, the relative humidity will increase from June to August and November to December. The relative humidity attained a low value of 43.62% it will also be maximum 75.02% (61.02 − 89.01) in January 2025. The relative humidity was dropped by 45.4% between January and May of 2017, and by 6.78% during September and October of 2017. It also increased by 37.23% from June to August 2017 and by 9.21% from November to December 2017. Again, relative humidity will decrease by 44.70% during January to April 2025 and 6.78% during September to October 2025. The relative humidity will increase by 37.23% from May to August 2025 and by 9.21% from November to December 2025. Further, we forecasted the monthly relative humidity using MLP. The structure of the MLP model consisted of 12 input nodes, 5 hidden layer nodes and 1 node in the output layer. The selected MLP model is defined by 12:5 s:1 l. The results of the MLP model were presented in the supplementary table (Table S2 ). Figure 9 shows that the relative humidity will be decrease every year from January to April and September to October during 2017-2025. It will also be increased from May to August and from November to December during 2017-2025. The minimum relative humidity of 49.17% was found in April 2017. The maximum relative humidity of 73.33% reached in the months of August 2017. The results of the MLP revealed that relative humidity decreased by 31.86% from January to April and 5.69% from September to October in 2017 and 2025, respectively. In addition, it increased 49.07% from May to August and 3.99% from November to December of 2017. In 2025, relative humidity will fall by 31.86% from January to May and 5.69% from September to October, while increasing 49.07% from June to August and 3.99% between November and December. The results of Table 4 showed the accuracy of SARIMA and MLP models. It was measured in terms of RMSE and MAE. The value of RMSE and MAE for MLP model were 4.65 and 3.42, which were lower than the RMSE of 6.04 and MAE of 4.56 for SARIMA model. Thus, results revealed that the MLP model provides better accuracy compared to the SARIMA model. In this study, SARIMA and ANN with MLP models were employed to forecast the monthly relative humidity in Delhi, India, during 2017-2025. Besides, it also compared their forecasting accuracy in terms of RMSE and MAE. From the results, we observed that the relative humidity will decrease from January to April, September to October, during 2017-2025. It will increase from May to August and November to December, during 2017-2025. Besides, the results of the study also indicated that the MLP model achieved better accuracy than the SARIMA model in terms of RMSE and MAE. Therefore, this research suggests that the ANN with MLP model performs better than the SARIMA model with minimum forecasting error. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s40808-022-01385-8. The financial grant in fellowship to the first author by the CSIR, India, is thankfully acknowledged. Funding This research did not receive any specific grant from funding agencies in the public commercial, or not-from-profit sectors. Code availability Not applicable. The authors declare that they do not have any kind of conflict regarding this work. Ethics approval Not applicable. Prediction of relative humidity in a high elevated basin of western Karakoram by using different machine learning models A new look at the statistical model identification Streamflow forecasting by modeling the rainfall-streamflow relationship using artificial neural networks Wind speed forecasting using regression, time series and neural network models: a case study of Suva Modeling of relative humidity trends in Iran Air temperature forecasting using artificial neural network for Ararat valley Performances of deep learning models for Indian Ocean wind speed prediction Long short-term memory artificial neural network approach to forecast meteorology and pm2. 5 local variables in bogotá, colombia Fuzzy time series for real-time flood forecasting Modelling and forecasting of monthly rainfall and temperature time series using SARIMA for trend detection-a case study of Umiam, Meghalaya (India) Time series analysis of climate variables using seasonal ARIMA approach Climate change: vulnerability and resilience of tourism and the entire economy Seasonal trend analysis and ARIMA modeling of relative humidity and wind speed time series around Yamula Dam Effect of meteorological variables on the incidence of respiratory tract infections Machine learning approaches for accurate prediction of relative humidity based on temperature and wet-bulb depression. Preprints Climate change 2001: the scientific basis. contribution of working group i to the third assessment report of the intergovernmental panel on climate change An alternative method for predicting relative humidity for climate change studies Time-series analysis and forecasting of rainfall at Idukki district, Kerala: Machine learning approach Prediction of tropospheric ozone using artificial neural network (ANN) and feature selection techniques Neural network model for discharge and water-level prediction for Ramganga River catchment of Ganga Basin Predictability of relative humidity by two artificial intelligence techniques using noisy data from two Californian gauging stations Climate change and variability in Kenya: a review of impacts on agriculture and food security Autoregressive integrated moving average time series model for forecasting air pollution in Nanded city Analysis and prediction of air pollution in Assam using ARIMA/SARIMA and machine learning. Innovations in sustainable energy and technology Forecasting monthly rainfall of Sub-Himalayan region of India using parametric and non-parametric modelling approaches Artificial neural network model in prediction of meteorological parameters during premonsoon thunderstorms Application of ARIMA and LSTM in relative humidity prediction On a measure of lack of fit in time series models An application of artificial neural networks for rainfall forecasting Comparison of daily rainfall forecasting using multilayer perceptron neural network model Climate change 2001: impacts, adaptation, and vulnerability: contribution of Working Group II to the third assessment report of the Intergovernmental Panel on Climate Change Arima modelling based relative humidity prediction analysis Comparative study of artificial neural network and arima models in predicting exchange rate Emerging climate change-related public health challenges in Africa: a case study of the heat-health vulnerability of informal settlement residents in Dar es Salaam Use of ANN models in the prediction of meteorological data New Ann model for forecasting Indian monsoon rainfall Hypothesis testing in arima(p, 1, q) models Relative humidity and mean monthly temperature forecasts in ahwaz station with arima model in time series analysis Prediction of indoor temperature and relative humidity based on cloud database by using an improved bp neural network in chongqing Problems of modern urban drainage in developing countries Rainfall and financial forecasting using fuzzy time series and neural networks based model Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction Artificial neural network for modeling reference evapotranspiration complex process in sudanosahelian zone Optimal time series model for forecasting monthly temperature in the southwestern region of Thailand Indication of relative humidity of ecmwf in precipitation forecast in Hainan prefecture Incidence of allergic rhinitis and meteorological variables: non-linear correlation and non-linear regression analysis based on Yunqi theory of Chinese medicine Forecasting of particulate matter time series using wavelet analysis and waveletarma/arima model in taiyuan Consent for publication Not applicable.