key: cord-0020577-t07o7h7z authors: Ghimire, Sujan; Yaseen, Zaher Mundher; Farooque, Aitazaz A.; Deo, Ravinesh C.; Zhang, Ji; Tao, Xiaohui title: Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks date: 2021-09-01 journal: Sci Rep DOI: 10.1038/s41598-021-96751-4 sha: 7fa2afa49886db60a7f77f89ad8233aba4af6e9a doc_id: 20577 cord_uid: t07o7h7z Streamflow (Q(flow)) prediction is one of the essential steps for the reliable and robust water resources planning and management. It is highly vital for hydropower operation, agricultural planning, and flood control. In this study, the convolution neural network (CNN) and Long-Short-term Memory network (LSTM) are combined to make a new integrated model called CNN-LSTM to predict the hourly Q(flow) (short-term) at Brisbane River and Teewah Creek, Australia. The CNN layers were used to extract the features of Q(flow) time-series, while the LSTM networks use these features from CNN for Q(flow) time series prediction. The proposed CNN-LSTM model is benchmarked against the standalone model CNN, LSTM, and Deep Neural Network models and several conventional artificial intelligence (AI) models. Q(flow) prediction is conducted for different time intervals with the length of 1-Week, 2-Weeks, 4-Weeks, and 9-Months, respectively. With the help of different performance metrics and graphical analysis visualization, the experimental results reveal that with small residual error between the actual and predicted Q(flow), the CNN-LSTM model outperforms all the benchmarked conventional AI models as well as ensemble models for all the time intervals. With 84% of Q(flow) prediction error below the range of 0.05 m(3) s(−1), CNN-LSTM demonstrates a better performance compared to 80% and 66% for LSTM and DNN, respectively. In summary, the results reveal that the proposed CNN-LSTM model based on the novel framework yields more accurate predictions. Thus, CNN-LSTM has significant practical value in Q(flow) prediction. The theoretical overview of the deep learning model, CNN, LSTM, DNN and CNN-LSTM is presented in this section. The theoretical explanation of the MLP 49 , GBM 50 , ELM 51 , XGB 52 , DT 53 , MARS 54 and RFR 55 are all elucidated elsewhere since they are well-known conventional AI models (MLP, ELM) and ensemble methodologies (GBM, XGB, DT, MARS and RFR). Convolutional neural network. CNN model 56, 57 differs from MLP by relying on the weight sharing concept. In literatures, three types of CNN networks are found, one-dimensional CNN (Conv1D), two-dimensional CNN (Conv2D) and three-dimensional CNN (Conv3D). In Conv1D, the convolution kernels move in one direction. The input and output data of Conv1D is 2-dimensional 58 . Mainly used for time series data 59 , the Conv1D has powerful capability of feature extraction: the non-linear features hidden in the raw data can be automatically extracted by the alternating convolutional layer and pooling layer in the Conv1D, and the adaptive feature learning is completed at the fully-connected layer. In this way, the Conv1D algorithm eliminates the manual process of feature extraction in traditional algorithms and end-to-end information processing is realized 60 . Conv2D is the first standard CNN introduced in the Lenet-5 architecture 61 . Conv2D is usually used for image data 62 . It is called Conv2D because the convolution kernels slide along the data in 2 dimensions. The input and output data of Conv2D is 3-dimensional, for instance, in image classifications CNN can detect edges, color distribution, etc. in an image, making these networks very powerful in image classification and other similar data containing spatial characteristics 63, 64 . In Conv3D, the convolution kernels moves in 3 directions, the input and output data of Conv3D is 4-dimensional 65 . Conv3D is mainly used for 3D image data, for instance, magnetic resonance imaging (MRI) data. MRI data is widely used to examine the brain, spinal cord, internal organs, etc., computer tomography (CT) scans are also three-dimensional data, which is an example of the creation of a series of X-ray images taken from different angles around the body. Conv3D are used to classify the medical data or extract features from it [66] [67] [68] . Figure 2 shows a one-dimensional (1D) convolution operation, where x 1 to x 6 represent the inputs while c 1 to c 4 represent the feature maps after 1D convolution. The red, blue, and green connections are the links between the input layer and the convoluting layer and each connection is weighted while connections that have the same color have equivalent weight value. Thus, only 3 weight values are needed in Fig. 2 to implement the convolution operation. One major advantage of the CNN model lies in its easy training phase due to the fewer number of weights compared to the number of weights in a fully-connected architecture. Furthermore, it allows the effective extraction of important features. Each convolutional layer may be represented as follow 69 : where f is the activation function, W k is weights of the kernel linked to the kth feature map, while * represents a convolution operator. The considered CNN in this study has a fully connected layer and three convolutional layers; the selection of the convolutional layer channels was based on grid search. Furthermore, the activation function used is the rectified linear units (ReLU) while adaptive moment estimation (Adam) is used as the optimization algorithm. The ReLU can be expressed thus: The one-dimensional (1D) convolution operator is used to ensure simplification of the modeling processes, as well as to ensure real-time Q flow prediction. The 1D convolution operator can make a direct prediction of the 1D Q flow data. Long short-term memory. Recurrent neural network (RNN) are powerful and robust type of artificial neural networks that uses existing time-series data to predict the future data over a specified length of time 70 . However, the RNNs can only recollect the recent information but not the earlier information 56 . Though the RNNs can be trained by back-propagation, it will be very difficult to train them for long input sequences due to vanishing gradients. Hence, the main drawback of the RNN architecture is its shorter memory to remember the features, vanishing and exploding gradients 71, 72 . In order to overcome the vanishing and exploding gradients problem LSTM model was proposed 73 , LSTMs are a special class of RNN that relies on special units called memory blocks in their hidden layers; these memory blocks perform the role of the normal neurons in the hidden layers 73, 74 . There are also three gate units in the memory blocks called input, output, and forget gates; these gates help in updating and controlling the flow of information through the memory blocks 75 . The LSTM network is calculated as follows 76 : (i) if the input gate is activated, any new input information into the system will be accumulated to the cell; (ii) the status of the previous cell is forgotten if the forget gate is activated; (iii) the www.nature.com/scientificreports/ propagation of the output of the latest cell to the ultimate state is controlled by the output gate. Figure 3 depicts the LSTM architecture. Regarding streamflow prediction, the historical lagged input data is represented as x = (x 1 , x 2 , . . . , x T ) while the predicted data is depicted as y = (y 1 , y 2 , . . . , y T ) . The computation of the predicted streamflow series is performed thus 77 : where c t : the activation vectors for cell, m t : activation vectors for each memory block, W : weight, b : bias vectors, • : scalar product, σ (.) : gate activation function, g(.) : input activation function, h(.) : output activation function. Figure 4 shows the proposed CNN-LSTM architecture in which the lagged hourly streamflow series serve as the inputs while the next hour streamflow is the output. In the proposed CNN-LSTM architecture, the first half is CNN that is used for feature extraction while the latter half is LSTM prediction that is for the analysis of the CNN-extracted features and for next-point streamflow prediction. There are three ID convolution layers in the CNN part of the proposed CNN-LSTM architecture. There is a close similarity between the DNN concept and artificial neural network with many hidden layers and nodes in each layer. It can be trained on a set of features which will be later used for the objective function approximation 78 . The naming of DNNs is based on the networks as they are typically a compilation of numerous functions. The notable application of DNN is the prediction of global solar radiation and wind speed [79] [80] [81] . Figure 6 plots an average Q flow characteristics for Brisbane River and Teewah Creek by year, month, and day. It can be seen from the figure that the Q flow of the Brisbane river is more than that of Teewah creek, for Brisbane River the Q flow is minimum at June whereas for Teewah creek Q flow is significantly reduced at July, September and December. Similarly, the peak Q flow occurs in February for Brisbane river whereas for Teewah creek the peak Q flow occurs at March, June, August, and November. In addition, the time series plot of the Q flow for the year 2017 is shown in Fig. 7 . Model development. Data preparation. During data preparation, the first step is to determine the stationarity of the Q flow time series. To do this, the Dicky-Fuller (DF) test was used in this study. With the application of the DF test, it implies that the null-hypothesis which suggests that the Q flow time series is non-stationary, will be rejected. The next step is correlation analysis phase which aims at identifying the model order. The autocorrelation function (ACF) analysis was adopted in this study for the determination of the input of the Q flow prediction model; this implies the determination of the input values that correlates maximally with the prediction values (Fig. 8) . The major aim of using the ACF analysis method is to perform prediction tasks 82 . Owing to the stationarity of the Q flow time series data, the computed 1-h interval autocorrelation function deteriorates at values < 0.27 as shown in Fig. 8 (the so-called correlation time (τc) in about 6-h (i.e., 6 lags of 1-h)). Q flow(t) is considered the time series variable while the vector (Q flow(t-6) , Q flow(t-5) , Q flow(t-4) , Q flow(t-3) , Q flow(t-2) , Q flow(t-1) , Q flow(t) ) is used in the next step as the input for the prediction of the value Q flow(t+1) . Data normalization. Using Eq. (5), the modelled dataset was scaled between 0 and 1 to avoid the high values of variation in the dataset for easier simulation and converted to its original scale after modeling using Eq. (6) 83,84 , where Q flow , Q flow(min) and Q flow(max) represent the input data value and its overall minimum and maximum values, respectively After normalization the data are segregated into training and testing sets as demonstrated in Table 1 . The Q flow prediction was done for the year 2018 in the different range, spanning from 1-Week to 9-Months. This study developed a 3-layer CNN model, 6-layer LSTM model, 4-layer CNN-LSTM model, and 4-layer DNN model. Table 2 presents the hyperparameters of the respective models which are selected based on the trial-and-error method as presented in Table 3 . Some of these hyperparameters are model specific. www.nature.com/scientificreports/ Common hyperparameters. The DL modes share the following four common hyperparameters: • Activation function: All the network layers rely on the same activation function ReLU except for the output layer. • Dropout: This was considered a potential regularization technique for minimizing the issue of overfitting in order to improve the training performance 84 . Hence, dropout selects a fraction of the neurons (defined as a real hyperparameter in the range of 0 and 1) at each iteration and prevent them from training. This fraction of neurons was maintained at 0.1 in this study. • Two statistics regularizations including L1: least absolute deviation and L2: least square error was used together with dropout. The role of the L1 and L2 penalization parameters is to minimize the sum of the absolute differences and sum of the square of the differences between the predicted and the target Q flow values, respectively. The addition of a regularization term to the loss is believed to encourage smooth network mappings in a DL network by penalizing large values of the parameters; this will reduce the level of nonlinearity that the network models. www.nature.com/scientificreports/ • Early stopping: The problem of overfitting was further addressed by introducing the early stopping (ES) criteria from Kera's work 85 ; the mode was set to minimum while patience was set to 20. This is to ensure that the training will terminate when the validation loss has stopped decreasing for the number of patience-specified epochs. CNN model hyperparameter. • Filter size: The size of the convolution operation filter. • Number of convolutions: The number of convolutional layers in each CNN. • Padding: This study utilized the same padding in order to ensure that the dimensions of input feature map and output feature map are the same. • Pool-size: A pooling layer is used between each convolution layer to avoid overfitting; this pooling layer helps in decreasing the number of parameters and network complexity. This study utilized a pool-size of 2 between layer 1 and 2. The proposed CNN-LSTM in this study is comprised of three convolutional layers with pooling operations; the selection of the convolutional layers channels was based on grid search. In the architecture of the model, the outputs of the flattening layer serve as the inputs of the LSTM recurrent layer while the LSTM recurrent layer is directly linked to the final outputs. The inputs of networks are the lagged matrix of hourly Q flow . The input parameter is the hourly Q flow while the CNN-LSTM hyperparameters are deduced via the trial and error method as presented in Tables 2 and 3 . Benchmark models. Open Performance metrics. In this section, the statistical metrics used for the model's evaluation were reported. Following several machine learning models for hydrological process simulation, the following statistical metrics were used, including Correlation Coefficient (r), root mean square error (RMSE), mean absolute error (MAE), relative absolute error (RAE), Integral normalized root squared error (INRSE), Nash-Sutcliffe coefficient (E NS ), Willmott's index (WI), normalised root mean square error (NRMSE) and Legate and McCabe's index (LM) [94] [95] [96] [97] [98] . Several researches found during their study that E NS and RMSE are the most commonly used reliable metrics for prediction problem 99 . Additionally, for the comparison of different models, the promoting percentages of mean absolute error (P MAE ) and promoting percentages of root mean square error (P RMSE ) were computed. Furthermore, the absolute percentage bias (APB) and Kling Gupta efficiency (KGE) as key performance indicators for Q flow prediction, were calculated as well 100 . The formulations of the metrics are: ii. Mean Absolute Error (MAE, m 3 s −1 ): ix. Relative absolute error (RAE, %): www.nature.com/scientificreports/ In this section, the predictability performance of the proposed CNN-LSTM model and the comparable models for the four experimental tests are conducted for 1-Week, 2-Weeks, 4-Weeks, and 9-months [20% of total Streamflow data ( Table 1) Additionally, in hydrological model the E NS is a widely used metric for prediction of streamflow, water level, drought etcetera and is considered as an expertise score calculated as the reasonable capability 100 that presents the mean values of the Q flow . However, E NS metric neglects the lower values and overestimates the larger ones 98 . In addition, Willmott's index (WI) metric is calculated due to its merits over the r and E NS . In the computation of the WI metric, errors and differences are given their appropriate weighting, which overcomes the insensitivity issues 98 . Further, WI and E NS do not take the absolute value into account and are oversensitive to peak residual values 101 , therefore LM was taken into consideration for further model assessment. The LM is not overestimated since it takes absolute values into account 102 . As shown in Table 5 Figures 9 and 10 show the hydrograph and the scatterplots (Fig. 11 ) of both the actual and predicted Q flow obtained by proposed CNN-LSTM model as well as conventional AI and ensemble models during the testing period. For the purpose of brevity, only the plots for prediction interval of 2-Weeks are shown. The hydrographs and the scatterplots demonstrate that the prediction of the CNN-LSTM model was closest to the observed Q flow values in comparison to the other models. The fit line formula ( y = mx + c ) presented in scatterplots where m and c are the model coefficients, respectively, closer to the 1 and 0 with a higher r value of 1.00 than ELM, MLP, LSTM, GBM and XGB models. Additionally, in hydrograph the relative error (RE) percentage are also shown, indicating that the RE of the CNN-LSTM model is comparatively smaller than that of other comparable models. www.nature.com/scientificreports/ It is worthwhile highlighting that ELM, MLP, XGB models are able to achieve a good predictability performance with the limitation in maintaining the good prediction for the high Q flow values (Figs. 8 and 9 ). On the contrary, the CNN-LSTM model achieves a superior prediction result for the peak values compared to ELM, MLP and XGB models. The CNN-LSTM model only underestimates the peak values by 1.15% as opposed to 2.57%, 3.98% and 2.69% for the XGB, ELM and MLP, respectively for Brisbane River. This demonstrates the suitability of the CNN-LSTM for streamflow prediction. To avoid the scale dependencies and impact of the outliers in the predicted streamflow, the RAE, NRMSE and INRSE were also recommended in some literatures 103 . Therefore, in this study further evaluation of model performance is conducted by using the RAE, NRMSE and INRSE (Table 6) . For both sites, the CNN-LSTM model achieves a lower value of RAE, NRMSE and INRSE, outperforming the conventional AI and ensemble models. In line with the results presented in Tables 4 and 5, the integration of CNN and LSTM again has shown to enhance the prediction capability. Furthermore, a comparison of the CNN-LSTM model without other models is performed in terms of the APB and KGE. The KGE and APB evaluation for the prediction of hourly Q flow reveals that the CNN-LSTM is the best performing model with KGE ≥ 0.991, APB ≤ 0.527 and KGE ≥ 0.991, APB ≤ 1.159 for Brisbane River and Teewah creek, respectively (Table 7) , indicating a good model performance 104 and making the CNN-LSTM model a reliable and powerful tool for the prediction of Q flow . Figure 12 compares the boxplot of the proposed CNN-LSTM model with that of the standalone deep learning model as well as conventional AI and ensemble models. The ♦ markers in the figure demonstrate the outliers of the absolute prediction error (|PE|) of the testing data together with their upper quartile, median, and lower quartile. The distributions of the |PE| error acquired by the proposed CNN-LSTM model for all sites exhibit a much smaller quartile followed by the standalone deep learning models. By analysing Fig. 11 , the accuracy of the proposed CNN-LSTM model for all sites is shown to be better than the comparative models. The empirical cumulative distribution function (ECDF, Fig. 12 ) at each site depicts the prediction capacity of different models. The proposed CNN-LSTM model is shown to be superior to the conventional AI and ensemble models as well as the standalone models including LSTM and DNN. Based on the error (0 to ± 0.05 m 3 s −1 ) for Sites www.nature.com/scientificreports/ both Brisbane River and Teewah Creek, Fig. 13 depicts that the proposed CNN-LSTM model is the most precise model in streamflow prediction. Figure 14 presents the frequency percentage distribution "histogram" of the predicted error (PE) based on the calculation of the error brackets with a 0.025 step size for Brisbane River. The presented graphical presentation can assist in a better understanding of model's prediction performance 83 . The figure clearly reveals the outperformance of the CNN-LSTM model against the standalone models (DNN and LSTM), conventional AI models (MLP and ELM) and ensemble model (XGB), since its PE values are close to the zero frequency distribution. In a more quantitative term, the CNN-LSTM model shows the highest percentage of PE (56%) in the bin (0 < PE ≤ 0.025) followed by the ELM (49%), LSTM (44%), GBM (41%) DNN (40%), and finally the MLP model (0%). The accumulated PE percentages indicate that the PE of the CNN-LSTM model was below 0.15, while the conventional AI models yield a total of 97% and ensemble model yield a total of 89% of the PE in this band. This again supports the conclusion that CNN-LSTM is a superior technique for streamflow prediction. To further investigate the prediction performance of the proposed CNN-LSTM model, the P MAE and P RMSE of the experimental tests are employed to make the comparisons and analysis. The model performance using Taylor diagram is presented in Fig. 15 105 . The main usage of this diagram is to present the closest predictive model with the observation in two-dimensional scale (standard deviation on the polar axis and correlation coefficient on the radial axis). Taylor diagram shows that the output of CNN-LSTM model is much closer to the actual observations compared to conventional AI and ensemble models. Overall, the aforementioned evaluation results suggest that the CNN-LSTM model is superior to the standalone deep learning model as well as conventional AI and ensemble models. The proposed model CNN-LSTM Table 5 www.nature.com/scientificreports/ www.nature.com/scientificreports/ is able to achieve a promising prediction performance and could be successfully applied to accurate and reliable hourly streamflow prediction. Furthermore, the averaged training time for the CNN-LSTM and the benchmarked models are listed in Table 2 www.nature.com/scientificreports/ Future work could involve testing the CNN-LSTM model through integration of more casual hydrometeorological datasets (e.g., synoptic climate data or rainfall data) as an input predictor. During model development, the CNN-LSTM as well as other comparative models' architecture that performed the best in the training period was determined as the optimal model (Table 2) . However, the hyperparameter tuning methods like, Grid search 106 , Tree-structured Parzen estimators (Hyperopt) 107 , Population-based training 108 , Bayesian Optimization and HyperBand 109 can also be used. These hyperparameter tuning methods can be time-consuming and resource-consuming, therefore separate study on the selection of best hyperparameter tuning methods can be conducted for Q flow prediction. In addition, data uncertainty and non-stationarity can be investigated for further insights on their influence on the modeling predictability performance. Furthermore, research could also include www.nature.com/scientificreports/ An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India Forecasting daily streamflow using hybrid ANN models Machine learning applications in hydrology Pixelsnail: An improved autoregressive generative model Forecasting based on an ensemble autoregressive moving average-adaptive neuro-fuzzy inference system-neural network-genetic algorithm framework Travel time prediction of expressway based on multi-dimensional data and the particle swarm optimization-autoregressive moving average with exogenous input model Predictability of monthly temperature and precipitation using automatic time series forecasting methods A survey of artificial neural network in wind energy systems Artificial neural network Global solar radiation prediction by ANN integrated with European Centre for medium range weather forecast fields in solar rich cities of Queensland Australia Efficiency evaluation of reverse osmosis desalination plant using hybridized multilayer perceptron with particle swarm optimization Support vector machine applications in the field of hydrology: A review Open channel sluice gate scouring parameters prediction: Different scenarios of dimensional and nondimensional input parameters Non-tuned data intelligent model for soil temperature estimation: A new approach Fuzzy Neural Intelligent Systems: Mathematical Foundation and the Applications in Engineering Regularized fuzzy neural network based on or neuron for time series forecasting Genetic programming in water resources engineering: A state-of-the-art review Input attributes optimization using the feasibility of genetic nature inspired algorithm: Application of river flow forecasting Optimization of windspeed prediction using an artificial neural network compared with a genetic programming model An intelligent evolutionary extreme gradient boosting algorithm development for modeling scour depths under submerged weir An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: Review, challenges and solutions Implementation of univariate paradigm for streamflow simulation using hybrid data-driven model: Case study in tropical region River flow forecasting with artificial neural networks using satellite observed precipitation pre-processed with flow length and travel time information: Case study of the Ganges river basin Incorporating synoptic-scale climate signals for streamflow modelling over the Mediterranean region using machine learning models Short term streamflow forecasting using artificial neural networks Neural networks forecasting of flood discharge at an unmeasured station using river upstream information Development of a coupled wavelet transform and neural network method for flow forecasting of nonperennial rivers in semi-arid watersheds Flow forecast by SWAT model and ANN in Pracana basin Hybridized extreme learning machine model with salp swarm algorithm: A novel predictive model for hydrological application A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series Robustness of extreme learning machine in the prediction of hydrological flow series An extreme learning machine model for the simulation of monthly mean streamflow water level in eastern Queensland Self-adaptive differential evolutionary extreme learning machines for long-term solar radiation prediction with remotely-sensed MODIS satellite and reanalysis atmospheric products in solar-rich cities Deep learning methods and applications Deep learning data-intelligence model based on adjusted forecasting window scale: Application in daily streamflow simulation Energy demand forecasting using deep learning Dimensional sentiment analysis using a regional CNN-LSTM model Convolutional, long short-term memory, fully connected deep neural networks Action recognition in video sequences using deep bi-directional LSTM with CNN features Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats Learning to monitor machine health with convolutional bi-directional LSTM networks Short-term prediction of residential power energy consumption via CNN and multi-layer bi-directional LSTM networks Predicting residential energy consumption using CNN-LSTM neural networks Deep solar radiation forecasting with convolutional neural network and long short-term memory network algorithms A robust deep learning framework for short-term wind power forecast of a full-scale wind farm using atmospheric variables Gold volatility prediction using a CNN-LSTM approach Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data Pressure drop modelling in sand filters in micro-irrigation using gradient boosted regression trees Extreme learning machines: A survey Xgboost: A scalable tree boosting system CutSplit: A decision-tree combining cutting and splitting for scalable packet classification An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Multi-head CNN-RNN for multi-time series anomaly detection: An industrial case study Convolutional neural networks for time series classification WiPass: 1D-CNN-based smartphone keystroke recognition using WiFi signals Data source authentication of synchrophasor measurement devices based on 1D-CNN and GRU Remaining useful life prediction of roller bearings based on improved 1D-CNN and simple recurrent unit Understanding convolutional neural networks with a mathematical model Combining a parallel 2D CNN with a self-attention dilated residual network for CTC-Based discrete speech emotion recognition Single-view 2D CNNs with fully automatic non-nodule categorization for false positive reduction in pulmonary nodule detection Speech emotion recognition using deep 1D & 2D CNN LSTM networks Three-dimensional convolutional neural network (3D-CNN) for heterogeneous material homogenization A deep-learning-based parameterization for complex geomodels Multiple local 3D CNNs for region-based prediction in smart cities Blog text quality assessment using a 3D CNN-based statistical framework Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition Forecasting of COVID-19 using deep layer Recurrent Neural Networks (RNNs) with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) cells Comparing SNNs and RNNs on neuromorphic vision datasets: Similarities and differences Period-aware content attention RNNs for time series forecasting with missing values Long short-term memory Convolutional, long short-term memory, fully connected deep neural networks Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization Convolutional LSTM network: A machine learning approach for precipitation nowcasting Wind speed forecasting approach using secondary decomposition algorithm and Elman neural networks A survey of deep neural network architectures and their applications Deep neural networks for ultra-short-term wind forecasting Transfer learning for short-term wind speed prediction with deep neural networks Deep Learning for solar power forecasting-An approach using AutoEncoder and LSTM Neural Networks Complementary data-intelligence model for river flow simulation A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model Scikit-learn: Machine learning in Python Gradient boosted regression trees in scikit-learn Keras: The python deep learning library Python: A programming language for software integration and development Introduction to Keras. Deep Learning with Python Tensorflow: A system for large-scale machine learning The Language of Technical Computing: Computation, Visualization, Programming. Installation Guide for UNIX Version 5 On the evaluation of model performance in physical geography Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance Some comments on the evaluation of model performance An On-line and off-line pipeline-based architecture of the system for gaps and outlier detection in energy data stream Evaluating the use of 'goodness-of-fit' measures in hydrologic and hydroclimatic model validation A survey on river water quality modelling using artificial intelligence models: 2000-2020 Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling A time series tool to support the multi-criteria performance evaluation of rainfall-runoff models Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique A survey of forecast error measures Model evaluation guidelines for systematic quantification of accuracy in watershed simulations Summarizing multiple aspects of model performance in a single diagram On the relationship between classical grid search and probabilistic roadmaps Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms Population based training of neural networks A novel bandit-based approach to hyperparameter optimization The paper utilized hourly streamflow data from the Water Monitoring Data Portal (Dept of Environment & Resource Management) are duly acknowledged. The authors acknowledge the support provided by the University of Southern Queensland Research and Training Scheme (RTS). S.G.: concept, modeling, software, writing. Z.M.Y.: Project leader, writing, validation, formal analysis, visualization. A.A.F.: writing, validation, formal analysis, visualization, funding. R.C.D., J.Z., X.T.: supervision, Project leader, writing, validation, formal analysis, visualization. The authors declare no competing interests. Correspondence and requests for materials should be addressed to Z.M.Y.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.