key: cord-0717877-cirltmy4
authors: Nabipour, M.; Nayyeri, P.; Jabani, H.; Mosavi, A.; Salwana, E.; S., Shahab
title: Deep Learning for Stock Market Prediction
date: 2020-07-30
journal: Entropy (Basel)
DOI: 10.3390/e22080840
sha: 817f46939280b386ec7f36b31de4f9357555e6f0
doc_id: 717877
cord_uid: cirltmy4

The prediction of stock groups values has always been attractive and challenging for shareholders due to its inherent dynamics, non-linearity, and complex nature. This paper concentrates on the future prediction of stock market groups. Four groups named diversified financials, petroleum, non-metallic minerals, and basic metals from Tehran stock exchange were chosen for experimental evaluations. Data were collected for the groups based on 10 years of historical records. The value predictions are created for 1, 2, 5, 10, 15, 20, and 30 days in advance. Various machine learning algorithms were utilized for prediction of future values of stock market groups. We employed decision tree, bagging, random forest, adaptive boosting (Adaboost), gradient boosting, and eXtreme gradient boosting (XGBoost), and artificial neural networks (ANN), recurrent neural network (RNN) and long short-term memory (LSTM). Ten technical indicators were selected as the inputs into each of the prediction models. Finally, the results of the predictions were presented for each technique based on four metrics. Among all algorithms used in this paper, LSTM shows more accurate results with the highest model fitting ability. In addition, for tree-based models, there is often an intense competition between Adaboost, Gradient Boosting, and XGBoost.

The prediction process of stock values is always a challenging problem [1] because of its long-term unpredictable nature. The dated market hypothesis believes that it is impossible to predict stock values and that stocks behave randomly, but recent technical analyses show that most stocks values are reflected in previous records; therefore the movement trends are vital to predict values effectively [2] . Moreover, stock market groups and movements are affected by several economic factors such as political events, general economic conditions, commodity price index, investors' expectations, movements of other stock markets, the psychology of investors, etc. [3] . The value of stock groups is computed with high market capitalization. There are different technical parameters to obtain statistical data from the value of stock prices [4] . Generally, stock indices are gained from prices of stocks with high market investment and were employed via principal component analysis (PCA) to predict the daily future of stock market index returns. The results showed that deep neural networks were superior as classifiers based on PCA-represented data compared to others. Das et al. [23] implemented the feature optimization through considering the social and biochemical aspects of the firefly method. In their approach, they involved the objective value selection in the evolutionary context. The results indicated that firefly, with an evolutionary framework applied to the Online Sequential Extreme Learning Machine (OSELM) prediction method, was the best model among other experimented ones. Hoseinzade and Haratizadeh [24] proposed a Convolutional Neural Networks (CNNs) framework, which can be applied to various data collections (involving different markets) to explore features for predicting the future movement of the markets. From the results, remarkable improvement in prediction's performance in comparison with other recent baseline methods was achieved. Krishna Kumar and Haider [25] compared the performance of single classifiers with a multi-level classifier, which was a hybrid of machine learning techniques (such as decision tree, support vector machine, and logistic regression classifier). The experimental results revealed that the multi-level classifiers outperformed the other works and led to a more accurate model with the best predictive ability, roughly 10 to 12% growth inaccuracy. Chung and Shin [26] applied one of the deep learning methods (CNNs) for predicting the stock market movement. In addition, the Genetic algorithm (GA) was employed to optimize the parameters of the CNN method systematically, and results showed that the GA-CNN outperformed the comparative models as the hybrid method of GA and CNN. Sim et al. [27] proposed CNN to predict stock prices as a new learning method. The study aimed to solve two problems, using CNNs and optimizing them for stock market data. Wen et al. [28] applied the CNN algorithm on noisy temporal series by frequent patterns as a new method. The results proved that the method was adequately effective and outperformed traditional signal process methods with a 4 to 7% accuracy improvement.

Rekha et al. [29] employed CNN and RNN to make a comparison between two algorithms' results and actual results via stock market data. Lee et al. [30] used CNNs to predict the global stock market and then trained and tested their model with data from other countries. The results demonstrated that the model could be trained on the relatively large data and tested on the small markets where there was not enough amount of data. Liu et al. [31] investigated a numerical-based attention method with dual sources stock market data to find the complementarity between numerical data and news in the prediction of stock prices. As a result, the method filtered noises effectively and outperformed prior models in dual sources stock prediction. Baek and Kim [32] proposed an approach for stock market index forecasting, which included a prediction LSTM module and an overfitting prevention LSTM module. The results confirmed that the proposed model had an excellent forecasting accuracy compared to a model without an overfitting prevention LSTM module. Chung and Shin [33] employed a hybrid approach of LSTM and GA to improve a novel stock market prediction model. The final results showed that the hybrid model of the LSTM network and GA was superior in comparison with the benchmark model. Chen et al. [34] used three neural networks, the radial basis function neural network, the extreme learning machine, and three traditional artificial neural networks, to evaluate their performance on high-frequency data of the stock market. Their results indicated that deep learning methods got transaction data from the nonlinear features and could predict the future of the market powerfully. Zhou et al. [35] applied LSTM and CNN on high-frequency data from the stock market with the approach of rolling partition training and testing set to evaluate the update cycle effect on the performance of models. Based on extensive experimental results, Models could effectively reduce errors and increase prediction accuracy. Chong et al. [36] tried to examine the performance of deep learning algorithms for stock market prediction with three unsupervised feature extraction ways, PCA, restricted Boltzmann machine and auto encoder. Final results with significant improvement suggested that additional information could be extracted by deep neural networks from the autoregressive model.

Long et al. [37] suggested an innovative end-to-end model named multi-filters neural network (MFNN) specifically for price prediction task and feature extraction on financial time series data. Their results indicated that the network outperformed common machine learning methods, statistical models, and convolutional, recurrent, and LSTM networks in terms of accuracy, stability and profitability. Moews et al. [38] proposed employing deep neural networks that use step-wise linear regressions in the preparatory feature engineering with exponential smoothing for this task, with regression slopes as movement strength indicators for a specified time interval. The final results showed the feasibility of the suggested method, with advanced accuracies and accounting for the statistical importance of the results for additional validation, as well as prominent implications for modern economics. Garcia et al. [39] examined the effect of financial indicators on the German DAX-30 stock index by employing a hybrid fuzzy neural network to forecast the one-day ahead direction of the index with various methods. Their experimental works demonstrated that the fall in the dimension through the factorial analysis produces less risky and profitable strategies. Cervelló-Royo and Guijarro [40] compared the performance of four machine learning models to validate the predicting capability of technical indicators in the technological NASDAQ index. The results showed that the random forest outperformed the other models deemed in their study, being able to predict the 10-days ahead market movement, with a normal accuracy of 80%. Konstantinos et al. [41] suggested an ensemble prediction combination method as an alternative approach to forecast time series. The ensemble learning technique combined various learning models. Their results indicated the effectiveness proposed ensemble learning way, and the comparative analysis showed adequate evidence that the method could be used successfully to conduct prediction based on multivariate time series problems.

Overall, all researchers believe that stock price prediction and modeling have been challenging problems for study and speculators due to noisy and non-stationary characteristics of data. There is a minor difference between papers for choosing the most effective indicators for modeling and predicting the future of stock markets. Feature selection can be an important part of studies to achieve better accuracy; however, all studies indicate that uncertainty is an inherent part of these forecasting tasks because of fundamental variables. Employing new machine learning and deep learning methods such as recent ensemble learning models, CNNs and RNNs with high prediction ability is a significant advantage of recent studies that show the forecasting potential of these methods in comparison with traditional and common approaches such as statistical analyses.

Iran's stock market has been highly popular recently because of the arising growth of Tehran Stock Exchange Dividend and Price Index-TEDPIX in the last decades, and one of the reasons is that most of the state-owned firms are being privatized under the general policies of article 44 in the Iranian constitution, and people are allowed to buy the shares of newly privatized firms under the specific circumstances. This market has some specific attributes in comparison with other country's stock markets, one of them being dealing a price limitation of ±5% of the opening price of the day for every index. This issue hinders the abnormal market fluctuation and scatters market shocks, political issues, etc. over a specific time and could make the market smoother and more predictable. Trading takes place through licensed registered private brokers of exchange organization and the opening price of the next day is through the defined base-volume of the companies and transaction volume as well. However, the deficiency of valuable papers on this market to predict future values with machine learning models is clear.

This study concentrates on the process of future value prediction for stock market groups, which are crucial for investors. Despite significant development in Iran's stock market in recent years, there has been not enough research on the stock price predictions and movements using novel machine learning methods. This paper aims to compare the performance of some regressors which applied on fluctuating data to evaluate predictor models, and the predictions are evaluated for 1, 2, 5, 10, 15, 20, and 30 days in advance. In addition, with tuning parameters, we try to reduce errors and increase the accuracy of models.

Ensemble learning models are broadly employed nowadays for its predictive performance progress. These methods combine multiple forecasts from one or multiple methods to improve the accuracy of simple prediction and to prevent possible overfitting problems. In addition, ANNs are Entropy 2020, 22, 840 5 of 23 universal approximators and flexible computing frameworks which can be used to an extensive range of time series predicting problems with a great degree of accuracy. Therefore, by considering the literature review, this research work examines the predictability of a set of cutting-edge machine learning methods, which involves tree-based models and deep learning methods. Employing the whole of tree-based methods, RNN, and LSTM techniques for regression problems and comparing their performance in Tehran stock exchange is a recent research activity presented in this study. This paper includes three different sections. At first, through the methodology section, the evolution of tree-based models with the introduction of each one is presented. Besides, the basic structure of neural networks and recurrent ones are described briefly. In the research data section, 10 technical indicators are shown in detail with selected methods parameters. At the final step, after introducing three regression metrics, machine learning results are reported for each group, and the model's behavior is compared.

Since the set of splitting rules employed to differently divide the predictor space can be summarized in a tree, these types of models are known as decision-tree methods. Figure 1 , adapted from [42, 43] shows the evolution of tree-based algorithms over several years and the following sections introduce them. literature review, this research work examines the predictability of a set of cutting-edge machine learning methods, which involves tree-based models and deep learning methods. Employing the whole of tree-based methods, RNN, and LSTM techniques for regression problems and comparing their performance in Tehran stock exchange is a recent research activity presented in this study. This paper includes three different sections. At first, through the methodology section, the evolution of tree-based models with the introduction of each one is presented. Besides, the basic structure of neural networks and recurrent ones are described briefly. In the research data section, 10 technical indicators are shown in detail with selected methods parameters. At the final step, after introducing three regression metrics, machine learning results are reported for each group, and the model's behavior is compared.

Since the set of splitting rules employed to differently divide the predictor space can be summarized in a tree, these types of models are known as decision-tree methods. Figure 1 , adapted from [42, 43] shows the evolution of tree-based algorithms over several years and the following sections introduce them. Decision Trees are a popular supervised learning technique used for classification and regression jobs. The purpose is to make a model that predicts a target value by learning easy decision rules formed from the data features. There are some advantages of using this method, such as it being easy to understand and interpret or able to work out problems with multi-outputs; on the contrary, creating over-complex trees that result in overfitting is a fairly common disadvantage. A schematic illustration of the Decision tree is shown in Figure 2 , adapted from [43] . Decision Trees are a popular supervised learning technique used for classification and regression jobs. The purpose is to make a model that predicts a target value by learning easy decision rules formed from the data features. There are some advantages of using this method, such as it being easy to understand and interpret or able to work out problems with multi-outputs; on the contrary, creating over-complex trees that result in overfitting is a fairly common disadvantage. A schematic illustration of the Decision tree is shown in Figure 2 , adapted from [43] . literature review, this research work examines the predictability of a set of cutting-edge machine learning methods, which involves tree-based models and deep learning methods. Employing the whole of tree-based methods, RNN, and LSTM techniques for regression problems and comparing their performance in Tehran stock exchange is a recent research activity presented in this study. This paper includes three different sections. At first, through the methodology section, the evolution of tree-based models with the introduction of each one is presented. Besides, the basic structure of neural networks and recurrent ones are described briefly. In the research data section, 10 technical indicators are shown in detail with selected methods parameters. At the final step, after introducing three regression metrics, machine learning results are reported for each group, and the model's behavior is compared.

Since the set of splitting rules employed to differently divide the predictor space can be summarized in a tree, these types of models are known as decision-tree methods. Figure 1 , adapted from [42, 43] shows the evolution of tree-based algorithms over several years and the following sections introduce them. Decision Trees are a popular supervised learning technique used for classification and regression jobs. The purpose is to make a model that predicts a target value by learning easy decision rules formed from the data features. There are some advantages of using this method, such as it being easy to understand and interpret or able to work out problems with multi-outputs; on the contrary, creating over-complex trees that result in overfitting is a fairly common disadvantage. A schematic illustration of the Decision tree is shown in Figure 2 , adapted from [43] . A Bagging model (as a regressor model) is an ensemble estimator that fits each basic regressor on random subsets of the dataset and next accumulate their single predictions, either by voting or by averaging, to make the final prediction. This method is a meta-estimator and can commonly be employed as an approach to decrease the variance of an estimator such as a decision tree by using randomization into its construction procedure and then creating an ensemble out of it. In this The random forest model is created by a great number of decision trees. This method simply averages the prediction result of trees, which is called a forest. In addition, this model has three random concepts; randomly choosing training data when making trees, selecting some subsets of features when splitting nodes, and considering only a subset of all features for splitting each node in each simple decision tree. During training data in a random forest, each tree learns from a random sample of the data points. A schematic illustration of the random forest, adapted from [43] , is indicated in Figure 3 . averaging, to make the final prediction. This method is a meta-estimator and can commonly be employed as an approach to decrease the variance of an estimator such as a decision tree by using randomization into its construction procedure and then creating an ensemble out of it. In this method, samples are drawn with replacement and predictions and obtained through a majority voting mechanism. The random forest model is created by a great number of decision trees. This method simply averages the prediction result of trees, which is called a forest. In addition, this model has three random concepts; randomly choosing training data when making trees, selecting some subsets of features when splitting nodes, and considering only a subset of all features for splitting each node in each simple decision tree. During training data in a random forest, each tree learns from a random sample of the data points. A schematic illustration of the random forest, adapted from [43] , is indicated in Figure 3 . The boosting method refers to a group of algorithms that converts weak learners to a powerful learner. The method is an ensemble for developing the model predictions of any learning algorithm. The concept of boosting is to sequentially train weak learners to correct their past performance. AdaBoost is a meta-estimator that starts by fitting a model on the main dataset and then fits additional copies of the model on a similar dataset. During the process, the samples' weights are adapted based on the current prediction error, so subsequent models concentrate more on difficult items. Gradient Boosting method is similar to AdaBoost when it sequentially adds predictors to an ensemble model, each of them correcting its past performance. In contrast with AdaBoost, Gradient Boosting fits a new predictor of the residual errors (made by the prior predictor) using gradient descent to find the failure in the predictions of the previous learner. Overall, the final model is capable of employing the base model to decreases errors over time.

The XGBoost is an ensemble tree method (similar to Gradient Boosting) and the method applies the principle of boosting for weak learners. However, XGBoost was introduced for better speed and performance. In-built cross-validation ability, efficient handling of missing data, regularization for avoiding overfitting, catch awareness, tree pruning, and parallelized tree building are common advantages of the XGBoost algorithm. The boosting method refers to a group of algorithms that converts weak learners to a powerful learner. The method is an ensemble for developing the model predictions of any learning algorithm. The concept of boosting is to sequentially train weak learners to correct their past performance. AdaBoost is a meta-estimator that starts by fitting a model on the main dataset and then fits additional copies of the model on a similar dataset. During the process, the samples' weights are adapted based on the current prediction error, so subsequent models concentrate more on difficult items. Gradient Boosting method is similar to AdaBoost when it sequentially adds predictors to an ensemble model, each of them correcting its past performance. In contrast with AdaBoost, Gradient Boosting fits a new predictor of the residual errors (made by the prior predictor) using gradient descent to find the failure in the predictions of the previous learner. Overall, the final model is capable of employing the base model to decreases errors over time.

The XGBoost is an ensemble tree method (similar to Gradient Boosting) and the method applies the principle of boosting for weak learners. However, XGBoost was introduced for better speed and performance. In-built cross-validation ability, efficient handling of missing data, regularization for avoiding overfitting, catch awareness, tree pruning, and parallelized tree building are common advantages of the XGBoost algorithm.

ANNs are single or multi-layer neural nets that are fully connected. Figure 4 shows a sample of ANN with an input and output layer and also two hidden layers, adapted from [43] . In a layer, each node is connected to every other node in the next layer. With an increase in the number of hidden layers, it is possible to make the network deeper.

ANNs are single or multi-layer neural nets that are fully connected. Figure 4 shows a sample of ANN with an input and output layer and also two hidden layers, adapted from [43] . In a layer, each node is connected to every other node in the next layer. With an increase in the number of hidden layers, it is possible to make the network deeper. Figure 5 is shown for each of the hidden or output nodes, while a node takes the weighted sum of the inputs, added to a bias value, and passes it through an activation function (usually a non-linear function). The result is the output of the node that becomes another node input for the next layer. The procedure moves from the input to the output, and the final output is determined by doing this process for all nodes. The learning process of weights and biases associated with all nodes for training the neural network.

Equation (1) shows the relationship between nodes, weights, and biases [44] . The weighted sum of inputs for a layer passed through a non-linear activation function to another node in the next layer. It can be interpreted as a vector, where X1, X2, …, and Xn are inputs, w1, w2, …, and wn are weights respectively, n is the number of the input for the final node, f is activation function and z is the output. 

By calculating weights/biases, the training process is completed by some rules: initialize the weights/biases for all the nodes randomly, performing a forward pass by the current weights/biases and calculating each node output, comparing the final output with the actual target, and modifying Figure 5 is shown for each of the hidden or output nodes, while a node takes the weighted sum of the inputs, added to a bias value, and passes it through an activation function (usually a non-linear function). The result is the output of the node that becomes another node input for the next layer. The procedure moves from the input to the output, and the final output is determined by doing this process for all nodes. The learning process of weights and biases associated with all nodes for training the neural network.

node is connected to every other node in the next layer. With an increase in the number of hidden layers, it is possible to make the network deeper. Figure 5 is shown for each of the hidden or output nodes, while a node takes the weighted sum of the inputs, added to a bias value, and passes it through an activation function (usually a non-linear function). The result is the output of the node that becomes another node input for the next layer. The procedure moves from the input to the output, and the final output is determined by doing this process for all nodes. The learning process of weights and biases associated with all nodes for training the neural network.

Equation (1) shows the relationship between nodes, weights, and biases [44] . The weighted sum of inputs for a layer passed through a non-linear activation function to another node in the next layer. It can be interpreted as a vector, where X1, X2, …, and Xn are inputs, w1, w2, …, and wn are weights respectively, n is the number of the input for the final node, f is activation function and z is the output. 

By calculating weights/biases, the training process is completed by some rules: initialize the weights/biases for all the nodes randomly, performing a forward pass by the current weights/biases and calculating each node output, comparing the final output with the actual target, and modifying Equation (1) shows the relationship between nodes, weights, and biases [44] . The weighted sum of inputs for a layer passed through a non-linear activation function to another node in the next layer. It can be interpreted as a vector, where X 1 , X 2 , . . . , and Xn are inputs, w 1 , w 2 , . . . , and w n are weights respectively, n is the number of the input for the final node, f is activation function and z is the output.

By calculating weights/biases, the training process is completed by some rules: initialize the weights/biases for all the nodes randomly, performing a forward pass by the current weights/biases and calculating each node output, comparing the final output with the actual target, and modifying the weights/biases consequently by gradient descent with the backward pass, generally known as backpropagation algorithm.

RNN is a very prominent version of neural networks extensively used in various processes. In a common neural network, the input is processed through several layers, and output is made. It is assumed that two consecutive inputs are independent of each other. However, the situation is not correct in all processes. For example, for the prediction of the stock market at a certain time, it is crucial to consider the previous observations.

Simple RNN has multiple neurons to create a network. Each neuron has a time-varying activation function and each connection between nodes has a real-valued weight that can be modified at each step. According to general architecture, the output of the node (at time t − 1) will be passed to the input (at time t) and add the data of itself (at time t) to make the output (at time t); recurrently exploiting the neuron node to flow multiple node elements to create RNN. Figure 6 , adapted from [43] shows a simple architecture of RNN. Furthermore, Equations (2) and (3) indicate the recursive formulas of RNN [45] .

where y t , h t , x t , and W h are output vector, hidden layer vector, input vector, and weighting matrix respectively the weights/biases consequently by gradient descent with the backward pass, generally known as backpropagation algorithm. RNN is a very prominent version of neural networks extensively used in various processes. In a common neural network, the input is processed through several layers, and output is made. It is assumed that two consecutive inputs are independent of each other. However, the situation is not correct in all processes. For example, for the prediction of the stock market at a certain time, it is crucial to consider the previous observations.

Simple RNN has multiple neurons to create a network. Each neuron has a time-varying activation function and each connection between nodes has a real-valued weight that can be modified at each step. According to general architecture, the output of the node (at time t-1) will be passed to the input (at time t) and add the data of itself (at time t) to make the output (at time t); recurrently exploiting the neuron node to flow multiple node elements to create RNN. Figure 6 , adapted from [43] shows a simple architecture of RNN. Furthermore, Equations (2) and (3) indicate the recursive formulas of RNN [45] . 

where yt, ht, xt, and Wh are output vector, hidden layer vector, input vector, and weighting matrix respectively LSTM is a specific kind of RNN with a wide range of applications similar to time series analysis, document classification, speech, and voice recognition. In contrast with feedforward ANNs, the predictions made by RNNs are dependent on previous estimations. In real, RNNs are not employed extensively because they have a few deficiencies which cause impractical evaluations. The difference between LSTM and RNN is that every neuron in LSTM is a memory cell. The LSTM links the prior information to the current neuron. Every neuron has three gates (input gate, forget gate, and output gate). By the internal gate, the LSTM is able to solve the long-term dependence problem of the data. LSTM architecture includes forget gate, input gate, and output gate. The forget gate controls discarding information from the cell, and Equations (4) and (5) show its related formulas where ht-1 is output at the prior time (t-1), and xt is input at the current time (t) into Sigmoid function (S(t)). All W and b are the weight matrices and bias vectors that require to be learned during the training process. ft defines how much information will be remembered or forgotten. The input gate defines which new information remember in cell state by Equations (5)- (7) . the value of it is generated to LSTM is a specific kind of RNN with a wide range of applications similar to time series analysis, document classification, speech, and voice recognition. In contrast with feedforward ANNs, the predictions made by RNNs are dependent on previous estimations. In real, RNNs are not employed extensively because they have a few deficiencies which cause impractical evaluations. The difference between LSTM and RNN is that every neuron in LSTM is a memory cell. The LSTM links the prior information to the current neuron. Every neuron has three gates (input gate, forget gate, and output gate). By the internal gate, the LSTM is able to solve the long-term dependence problem of the data. LSTM architecture includes forget gate, input gate, and output gate. The forget gate controls discarding information from the cell, and Equations (4) and (5) show its related formulas where h t−1 is output at the prior time (t − 1), and x t is input at the current time (t) into Sigmoid function (S(t)). All W and b are the weight matrices and bias vectors that require to be learned during the training process. f t defines how much information will be remembered or forgotten. The input gate defines which new information remember in cell state by Equations (5)- (7) . The value of i t is generated to determine how much new information cell state need to be remembered. A tanh function gains an election message to be added to the cell state by inputting the output (h t−1 ) at the prior time (t − 1) and adding the current time t input information (x t ). C t gets the updated information that must be added to the cell state (Equation (8)). The output gate defines which information will be output in cell state. The value of o t is Entropy 2020, 22, 840 9 of 23 between 0 and 1; which is employed to indicate how many cells state information that need to output (Equation (9)). The result of h t is the LSTM block's output information at time t (Equation (10)) [45] .

C

This study aims to make a short run prediction for the emerging Iranian stock market and employ data from November 2009 to November 2019 (10 years) of four stock market groups, Diversified Financials, Petroleum, Non-metallic minerals, and Basic metals, which are completely generous. From the opening, close, low high, and prices of the groups, 10 technical indicators are calculated. The data for this study is supplied from the online repository of the Tehran Securities Exchange Technology Management Co (TSETMC) [46] . Before using information for the training process, it is vital to take a preprocessing step. We employ data cleaning, which is the process of detecting and correcting inaccurate records from a dataset and refers to identifying inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty data. The interquartile range (IQR score) is a measure of statistical dispersion and is robust against outliers, and this method is used to detect outliers and modify the dataset. Indeed, as an important point, to prevent the effect of the larger value of an indicator on the smaller ones, the values of 10 technical indicators for all groups are normalized independently. Data normalization refers to rescaling actual numeric features into a 0 to 1 range and is employed in machine learning to create a training model less sensitive to the scale of variables. Table 1 indicates all the technical indicators, which are employed as input values based on domain experts and previous studies [47] [48] [49] ; the input values for calculating indicators in the table are opening, high, low and closing prices in each trading day; "t" means current time, and "t + 1" and "t − 1" mean one day ahead and one day before, respectively. Table 2 shows the summary statistics of indicators for the groups. Simple n-day moving average = C t +C t−1 +...+C t−n+1 n Weighted 14-day moving average = n * C t +(n−1) * C t−1 +...+C t−n+1 n+(n−1)+...+1 n is number of days C t is the closing price at time t L t and H t is the low price and high price at time t, respectively LL t__t−n+1 and HH t__t−n+1 is the lowest low and highest high prices in the last n days, respectively UP t and DW t means upward price change and downward price change at time t, respectively EMA(

Moving average convergence divergence (MACD t ) = EMA (12) SMA is calculated by the average of prices in a selected range, and this indicator can help to determine if a price will continue its trend. WMA gives us a weighted average of the last n values, where the weighting falls with each prior price. MOM calculates the speed of the rise or falls in stock prices and it is a very useful indicator of weakness or strength in evaluating prices. STCK is a momentum indicator over a particular period of time to compare a certain closing price of a stock to its price range. The oscillator sensitivity to market trends can be reduced by modifying that time period or by a moving average of results. STCD measures the relative position of the closing prices in comparison with the amplitude of price oscillations in a certain period. This indicator is based on the assumption that as prices increase, the closing price tends towards the values which belong to the upper part of the area of price movements in the preceding period and when prices decrease, the opposite is correct. LWR is a type of momentum indicator which evaluates oversold and overbought levels. Sometimes LWR is used to find exit and entry times in the stock market. MACD is another type of momentum indicator which indicates the relationship between two moving averages of a share's price. Traders can usually use it to buy the stock when the MACD crosses above its signal line and sell the shares when the MACD crosses below the signal line. ADO is usually used to find out the flow of money into or out of stock. ADO line is normally employed by traders seeking to determine buying or selling time of stock or verify the strength of a trend. RSI is a momentum indicator that evaluates the magnitude of recent value changes to assess oversold or overbought conditions for stock prices. RSI is showed as an oscillator (a line graph which moves between two extremes) and moves between 0 to 100. CCI is employed as a momentum-based oscillator to determine when a stock price is reaching a condition of being oversold or overbought. CCI also measures the difference between the historical average price and the current price. The indicator determines the time of entry or exit for traders by providing trade signals.

Dataset used for all models-except RNN and LSTM models-are identical. There are 10 features (10 technical indicators) and one target (stock index of the group) for each sample of the dataset. As mentioned, all 10 features are normalized independently before being used to fit models and improve the performance of algorithms. Since the goal is developing models to predict stock group values, datasets are rearranged to incorporate the 10 features of each day to the target value of n-days ahead. In this study, models are evaluated by training them to predict the target value for 1, 2, 5, 10, 15, 20, and 30 days ahead. There are several parameters related to each model, but we tried to choose the most effective ones concerning our experimental works and prior studies. For tree-based models, several trees (ntrees) were the design parameter while other common parameters are set identical between all models. Parameters and their values for each model are listed in Table 3 . The number of trees to perform tree-based models is fairly robust to over-fitting, so a large number typically results in better prediction. The maximum depth of the individual regression estimators limits the number of nodes in the tree. The best value depends on the interaction of the input variables. In machine learning, the learning rate is an important parameter in an optimization method that finds out the step size at each iteration while moving toward a minimum of a loss function. For RNN and LSTM networks, because of their time-series behavior, datasets are arranged to include the features of more than just one day. While for the ANN model, all parameters but epochs are constant; for RNN and LSTM models, the variable parameters are several days included in the training dataset and respective epochs. By increasing the number of days in the training set, the number of epochs is increased to train the models with an adequate number of epochs. Table 4 presents all valid values for the parameters of each model. For example, if five days are included in the training set for ANN, RNN, or LSTM models, the number of epochs is set to 300 to thoroughly train the models. The activation function of a node in ANNs describes the output of that node given an input or set of inputs. Optimizers are methods employed to change the attributes of ANNs, such as learning rate and weights to reduce the losses. An epoch is a term used in ANNs and shows the number of passes of the entire training dataset the ANN model has completed.

In this section four metrics used in the study are introduced.

Mean Absolute Percentage Error (MAPE) is often employed to assess the performance of the prediction methods. MAPE is also a measure of prediction accuracy for forecasting methods in the machine learning area, it commonly presents accuracy as a percentage. Equation (11) shows its formula [50] .

where A t is the actual value and F t is the forecast value. In the formula, the absolute value of the difference between those is divided by A t . The absolute value is summed for every forecasted value and divided by the number of data. Finally, the percentage error is made by multiplying to 100.

Mean absolute error (MAE) is a measure of the difference between two values. MAE is an average of the difference between the prediction and the actual values. MAE is a usual measure of prediction error for regression analysis in the machine learning area. The formula is shown in Equation (12) [50,51].

where A t is the true value and F t is the prediction value. In the formula, the absolute value of the difference between those is divided by n (number of samples) and summed for every forecasted value.

Root Mean Square Error (RMSE) is the standard deviation of the prediction errors in regression work. Prediction errors or residuals show the distance between real values and a prediction model, and how they are spread out around the model. The metric indicates how data is concentrated near the best fitting model. RMSE is the square root of the average of squared differences between predictions and actual observations. Relative Root Mean Square Error (RRMSE) is similar to RMSE and this takes the total squared error and normalizes it by dividing by the total squared error of the predictor model. The formula is shown in Equation (13) [50,51].

where A t is the observed value, F t is the prediction value and n is the number of samples.

The Mean Squared Error (MSE) measures the quality of a predictors and its value is always non-negative (values closer to zero are better). The MSE is the second moment of the error (about the origin), and incorporates both the variance of the prediction model (how widely spread the predictions are from one data sample to another) and its bias (how close the average predicted value is from the observation). The formula is shown in Equation (14) [50].

where A t is the observed value, F t is the prediction value and n is the number of samples.

Six tree-based models namely Decision Tree, Bagging, Random Forest, Adaboost, Gradient Boosting, and XGBoost, and also three neural networks-based algorithms (ANN, RNN, and LSTM) are employed in the prediction of the four stock market groups. For the purpose, prediction experiments for 1, 2, 5, 10, 15, 20, and 30 days in advance of time are conducted. Results for Diversified Financials are depicted in Tables 5-11 for instance. For better understanding and reduction of the number of result tables, the average performance of algorithms for each group is demonstrated in Tables 12-15,  and also Table 16 shows the average runtime per sample for all models. It is prominent to note that a comprehensive number of experiments are performed for each of the groups and prediction models with various model parameters. The following tables show the best parameters where a minimum prediction error is obtained. Indeed, it is clear from the results that error values generally rise when prediction models are created for a greater number of days ahead. For example, MAPE values of XGBoost are 0.88, 1.14, 1.45, 1.77, 2.03, 2.30, and 2.48 respectively. However, it is possible to observe a less strict ascending trend in some cases (which was seen in previous studies similarly) due to deficiency in the prediction ability of some models in some special cases based on the main dataset. In this work, we use all of 10 technical indicators as 10 input features and the number of data is 2600. To prevent overfitting, we randomly split our main dataset into two parts, train data and test data, at the first step and then fit our models on the train data. Seventy percent of the main dataset (1820 data) is assigned to train data. Next, the models are used to predict future values and calculate metrics with test data (780 data). In addition, we employ regularization and validation data (20% of train data) to increase our accuracy and tune our hyperparameters during training (the training process for tree-based models and ANNs is different here). Figure 7 shows the performance of XGBoost for five days ahead of Diversified Financials as an example. The comparison between actual values and predicted values indicate the quality of modeling and the prediction task. It is important to note that the cases are not exactly consecutive trading days because we split our dataset randomly by shuffling.

By deeming the literature, our result in this study is one of the most accurate predictions and it can be interpreted by the dataset and the performance of models. It is true that the process of training is totally important, but we believe that the role of the dataset is greater here. The dataset is relatively specific because of some rules in Tehran stock exchange. For example, the value change of each stock is limited to +5% and −5%, or the closing price of a stock is close to the opening price on the next trading day. These rules are learned by machine learning algorithms and then the models are able to significantly predict our dataset from Tehran stock exchange.

Regarding the results of Diversified Financials as an example, Adaboost regressor and LSTM can predict the future prices well with normally 1.59% and 0.60% error; these values become more important when we know that the maximum range of changes is 10% (from −5% to +5%). So, with the specific dataset and powerful models, we still have noticeable errors, which indicate the effect of fundamental parameters. Fundamental analysis is a method of measuring a security's intrinsic value by examining related economic and financial factors. This method of stock analysis is considered to be in contrast to technical analysis, which forecasts the direction of prices. Noticeably, most non-scientific factors such as policies, increase in tax etc. affect the groups in stock markets; for example, the pharmaceutical industries experience growth with Covid-19 at the present time.

Based on extensive experimental works and reported values the following results are obtained:

Among tree-based models The average runtime of deep learning models is high compared to others • LSTM is powerfully the best model for prediction all stock market groups with the lowest error and the best ability to fit, but the problem is the great runtime

In spite of noticeable efforts to find valuable studies on the same stock market, there is not any important paper to report, and this deficiency is one of the novelties of this research. We believe that this paper can be a baseline to compare for future studies. 

For all investors, it is always necessary to predict stock market changes for detecting accurate profits and reducing potential mark risks. This study employed tree-based models (Decision Tree, Bagging, Random Forest, Adaboost, Gradient Boosting, and XGBoost) and neural networks (ANN, RNN, and LSTM) to correctly forecast the values of four stock market groups (Diversified Financials, Petroleum, Non-metallic minerals, and Basic metals) as a regression problem. The predictions were made for 1, 2, 5, 10, 15, 20, and 30 days ahead. As far as our belief and knowledge, this study is the most successful and recent research work that involves ensemble learning methods and deep learning algorithms for predicting stock groups as a popular application. To be more detailed, exponentially smoothed technical indicators and features were used as inputs for prediction. In this prediction problem, the methods were able to significantly advance their performance, and LSTM was the top performer in comparison with other techniques. Overall, as a logical conclusion, both tree-based and deep learning algorithms showed remarkable potential in regression problems to predict the future values of the Tehran stock exchange. Among all models, LSTM was our superior model for predicting all stock market groups with the lowest error and the best ability to fit (by average values of MAPE: 0.60, 1.18, 1.52 and 0.54), but the problem was the great runtime (80.902 ms per sample). As future work, we recommend using the algorithms on other stock markets or examining other hyperparameters effects on the final results. 

For all investors, it is always necessary to predict stock market changes for detecting accurate profits and reducing potential mark risks. This study employed tree-based models (Decision Tree, Bagging, Random Forest, Adaboost, Gradient Boosting, and XGBoost) and neural networks (ANN, RNN, and LSTM) to correctly forecast the values of four stock market groups (Diversified Financials, Petroleum, Non-metallic minerals, and Basic metals) as a regression problem. The predictions were made for 1, 2, 5, 10, 15, 20, and 30 days ahead. As far as our belief and knowledge, this study is the most successful and recent research work that involves ensemble learning methods and deep learning algorithms for predicting stock groups as a popular application. To be more detailed, exponentially smoothed technical indicators and features were used as inputs for prediction. In this prediction problem, the methods were able to significantly advance their performance, and LSTM was the top performer in comparison with other techniques. Overall, as a logical conclusion, both tree-based and deep learning algorithms showed remarkable potential in regression problems to predict the future values of the Tehran stock exchange. Among all models, LSTM was our superior model for predicting all stock market groups with the lowest error and the best ability to fit (by average values of MAPE: 0.60, 1.18, 1.52 and 0.54), but the problem was the great runtime (80.902 ms per sample). As future work, we recommend using the algorithms on other stock markets or examining other hyperparameters effects on the final results.

Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction. Knowl.-Based Syst

Capital markets efficiency: Evidence from the emerging capital market with particular reference to Dhaka stock exchange

Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction. Knowl.-Based Syst

Capital markets efficiency: Evidence from the emerging capital market with particular reference to Dhaka stock exchange

Stock price forecast based on bacterial colony RBF neural network

Overview and History of Statistics for Equity Markets

Impact of the stock market capitalization and the banking spread in growth and development in Latin American: A panel data estimation with System GMM

Stock market value prediction using neural networks

Stock market prediction with multiple classifiers

Stock market analysis: A review and taxonomy of prediction techniques

Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques

Evaluating multiple classifiers for stock price direction prediction

Evaluating the employment of technical indicators in predicting stock price index variations using artificial neural networks (case study: Tehran Stock Exchange)

Predicting stock returns by classifier ensembles

Computational intelligence and financial markets: A survey and future directions

Stock price prediction using LSTM, RNN and CNN-sliding window model

An Effective Time Series Analysis for Equity Market Prediction Using Deep Learning Model

Robust online time series prediction with recurrent neural networks

Reinforced recurrent neural networks for multi-step-ahead flood forecasts

An integrated framework of deep learning and knowledge graph for prediction of stock price trend: An application in Chinese stock exchange market

An innovative neural network approach for stock market prediction

Stock Market Prediction Using Optimized Deep-ConvLSTM Model

Augmented Textual Features-Based Stock Market Prediction

Predicting the daily return direction of the stock market using hybrid machine learning algorithms. Financ

Stock market prediction using Firefly algorithm with evolutionary framework optimized feature reduction for OSELM method

CNNpred: CNN-based stock market prediction using a diverse set of variables

Blended computation of machine learning with the recurrent neural network for intra-day stock market movement prediction using a multi-level classifier

Genetic algorithm-optimized multi-channel convolutional neural network for stock market prediction

Is deep learning for image recognition applicable to stock market prediction? Complexity

Stock Market Trend Prediction Using High-Order Information of Time Series

Prediction of Stock Market Using Neural Network Strategies

Global stock market prediction based on stock chart images using deep Q-network

A numerical-based attention method for stock market prediction with dual information

ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module

Genetic algorithm-optimized long short-term memory network for stock market prediction

Which artificial intelligence algorithm better predicts the Chinese stock market

Stock market prediction on high-frequency data using generative adversarial nets

Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies

Deep learning-based feature engineering for stock price movement prediction. Knowl.-Based Syst

Lagged correlation-based deep learning for directional trend change prediction in financial time series

Hybrid fuzzy neural network to predict price direction in the German DAX-30 index

Forecasting stock market trend: A comparison of machine learning algorithms. Financ. Mark

Exploring an Ensemble of Methods that Combines Fuzzy Cognitive Maps and Neural Networks in Solving the Time Series Prediction Problem of Gas Consumption in Greece

Machine Learning: A Probabilistic Perspective

Deep learning for Stock Market Prediction

The Handbook of Brain Theory and Neural Networks

Learning long-term dependencies in NARX recurrent neural networks

Data science in economics

Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange

Predicting stock market index using fusion of machine learning techniques

The authors declare no conflict of interests.