key: cord-0636857-f33tifbv authors: Zhou, Peng; Li, Fangyi title: Prediction of Fund Net Value Based on ARIMA-LSTM Hybrid Model date: 2021-11-19 journal: nan DOI: nan sha: 7026e75c5fa6ba87199155eca9bf3b79d8bd3cd4 doc_id: 636857 cord_uid: f33tifbv The net value of the fund is affected by performance and market, and the researchers try to quantify these effects to predict the future net value by establishing different models. The current prediction models usually can only reflect the linear variation law, poorly handled or selectively ignore their nonlinear characteristics, so the prediction results are usually less accurate. This paper uses a fund prediction method based on the ARIMA-LSTM hybrid model. After preprocessing the historical data, the first filter out the linear data characteristics with the ARIMA model, then pass the data to the LSTM model to extract the nonlinear characteristic by residual, and finally superposition the respective prediction values of the two models to obtain the prediction results of the hybrid model. Empirically shows that the methods in the paper are more accurate and applicable than traditional fund prediction methods. With investment and financial education into the national education system, public awareness of financial management gradually increases [1] . At the same time, due to the epidemic's impact, the market interest rate continues to decline, making the fund market with low-risk high returns one of the most favoured financial options for investors. Data show that the average yield of 2020 partial stock hybrid funds was nearly 58%, and the average equity funds reached 60%, both of which are the highest level in nearly 11 years. The fund is quite active in market trading with its excellent performance. For the fund investment, the change of the fund's net value is an important factor in measuring the fund's profitability. Predicting the fund's income through the net value has also become a hot research topic in recent years. In this context, some experts and scholars have launched a wealth of research on this problem. Chen Jianing predicted the income of Internet money fund through wavelet analysis [3] , Xiang Ying and others used the ARIMA model to predict the fund net value [4] , and Meng Guoying tried to apply machine learning to fund performance prediction [5] . However, because various factors influence the fund in the market, most of these models only pursue the linear law of the net value change, ignoring the impact of the nonlinear change of the fund's net value on the prediction accuracy. Based on this, because the ARIMA model can handle the linear features in the time series well, and the LSTM model has an excellent performance in dealing with nonlinear problems [6] , this paper proposes a hybrid model called ARIMA-LSTM that combines the two models. After analyzing the fund's net value trend, the prediction model obtained is compared with the traditional single model. The experimental results found that the ARIMA-LSTM hybrid model has better fitting performance and higher prediction accuracy than the traditional single model in applying fund net value prediction. Autoregressive Integrated Moving Averaged Model is a commonly used time series forecasting method. The core idea of the model is to find a suitable mathematical function to fit the linear relationship between the current time value, the past time value, and the random interference amount to infer the future value through the past value [7] . The essence of the ARIMA model is an improvement of the ARMA model, and its mathematical formula is: Where is the residual and is a stable time series. It should be noted that the ARIMA model can process only stable time series. If the time series is unstable, it needs to be transformed into a stable series by difference. The processed model is called the ARIMA model, denoted as . Both p and q represent the order of the model. When p=0, the model degenerates to a q-order MA model MA(q), and when q=0, the model degenerates to a p-order AR model AR(p). In addition, d indicates how many differences have passed. The basic steps of ARIMA model modelling are: analyzing the ADF value of the series, determining the (p, d, q) value of the model, estimating the correlation coefficient of MA and AR, testing the white noise series, and creating a prediction model. In order to solve the exploding gradient and gradient disappearance of the recurrent neural network (RNN) during the operation, Hochreiter and Schmidhuber proposed an improved method for the recurrent neural network, namely the LSTM neural network model (Long Short-Term Memory) [8] . Unlike the RNN model, the LSTM model resets a cell state in the original hidden layer to preserve long-term memory. The LSTM structure is shown in Figure 1 . The internal structure of the model is mainly composed of three control gates: input gate, forget gate and output gate. It is worth noting that is the activation function, and represent the cell state at and , respectively. and are the hidden states of the cell at and . Firstly, the hidden state at the time can be determined through the forget gate of the model, and the degree of information retention of input can also be determined. The formula is: Secondly, you can determine how much content in the input variable can be stored in the cell state through the input gate. The formula is: Finally, the output gate of LSTM outputs the hidden state of each cell, the formula is: In the above formula, are the weight matrix of different control gates; are the bias term of each control gate; and are the corresponding activation functions, which express how much information passes through the different control gates. Changes in fund net worth usually have strong nonlinear and irregular [9] , and predictions using only a single model often yield poor results. Based on the fund's net value, using the ARIMA-LSTM hybrid model, filter the linear features with the ARIMA model, and then give the nonlinear characteristics stored to the LSTM model for processing, which can ensure the linear and nonlinear characteristics of the data. Finally, combining the prediction results of two models to obtain the prediction results of the hybrid model. See Figure 2 for its flowchart. First, we use the ARIMA model to predict the linear part of the series and get the predicted value . Then subtract from the true value to get the residual series . Use the LSTM model to process the series obtained in the previous step to predict the non-linear part of the fund's net value to obtain the predicted value . Finally, the ARIMA-LSTM hybrid model's predicted value equals the sum of the two-step predicted values. IV. EXAMPLE ANALYSIS This article selects the 1260-day fund net value data of Huabao Hybrid Fund (240008) from June 6, 2016, to July 30, 2021. The data used is derived from the historical net value of funds that have been published on the Tiantian Fund Website. The 1260-day data is divided into three parts, as shown in Table 1 , for different model training processes. At the same time, this paper adopts the sliding window prediction method [10] , as shown in Figure 3 . Assuming L is the length of the window to be trained, starting from the leftmost time t, this model predicts the next days' net value and continues to move forward one day until it reaches the rightmost time T. It should be noted that the model will only predict the net value on the next day and will use L-length historical data for analysis before predicting. This article selects three common error evaluation indicators to evaluate the prediction accuracy of different models [11] . These three indicators are MSE, MAE, RMSE, and the following are their respective mathematical expressions: The three indicators have the following characteristics: the smaller the value, the smaller the error between the value predicted by the model and the true value, which means the higher accuracy. Figure 4 is the original time series chart of the fund. It is not difficult to see that the data changes drastically, and there is no obvious change rule. Besides, the chart rises sharply after 1000 days, proving that the series is a non-stable series, so the difference method is needed to convert the original sequence into a stable series. Figure 5 is the series diagram after the first-order difference processing. It can be seen that the processed series is more stable than the original series. Through the ADF test, it can be determined that the series can become stable after the first-order difference (d=1) Then the parameters p and q of the ARIMA model can be inferred from the autocorrelation coefficient (ACF) and partial autocorrelation coefficient (PACF) diagrams of the series. Figures 6 and 7 are the ACF and PACF diagrams of the fund's net value. After analysis, it is found that the ACF diagram is lagging the first-order truncation, and the PACF diagram is also the first-order truncation. In order to improve the accuracy of the model, this article refers to the AIC values of different (p, d, q) combinations, and the AIC value of the combination (0, 1, 0) is the smallest, which is -2851, indicating the model fit created under this combination Highest degree and best prediction effect. If only use ARIMA single model to predict the fund's net value, the obtained model prediction chart is shown in Figure 8 . It can be seen that the ARIMA model has a low prediction accuracy of the fund's net value after 1000 days, which means the model is not suitable for use in actual fund forecasting Finally, use Python to build a suitable LSTM model. After many parameter adjustments, the final model structure is determined as follows: the number of layers is 3, the input and output dimensions are both set to 1, the learning rate is 0.005, the training iteration is 100 times, and the batch_size is 64. Figure 9 and Figure 10 show the fit of the target fund and the forecast of future net value using the hybrid model. It can be seen that the value predicted by the ARIMA-LSTM hybrid model is roughly the same as the real trend, and the degree of fit is significantly better than that of the ARIMA model. Under the premise of the same research data, the prediction results of the three models are shown in Table 2 . It can be seen from the data that the values of MSE, MAE, and RMSE of the LSTM model are much lower than those of the ARIMA model, and the values of the ARIMA-LSTM hybrid model are also lower than those of the other two single models. Through the quantitative analysis of the indicators, the paper found out that the prediction effect of the ARIMA model is the worst, the prediction effect of the LSTM model is better, the prediction effect of the hybrid model is the best. To sum up: the ranking of the prediction effects of the three models from high to low is the ARIMA-LSTM hybrid model, the LSTM model and the ARIMA model. In conclusion, the ARIMA-LSTM hybrid model is a more reliable time series analysis model, which is more suitable for predicting the fund's net value in real life than the independent model. The change of the fund's net value has both linear and non-linear characteristics. Using traditional models to predict the net value is difficult to deal with non-linearity, resulting in low accuracy of prediction. Although machine learning prediction methods have great advantages in dealing with non-linear problems, they are prone to overfitting when dealing with small data samples, making the prediction accuracy not high. The hybrid model separates the two characteristics and combines their respective advantages. It performs well in dealing with complex time series issues such as the fund's net value and has proved to be a more reliable analysis and forecast tool. research on financial consumer education in the era of internet finance Study on the Impact of COVID-19 Epidemic on China's Securities Investment Funds Prediction and analysis of Internet money fund income based on wavelet analysis Application of ARIMA model in fund net value prediction Performance prediction model of private equity funds based on machine learning Prediction of Antarctic monthly mean surface temperature based on ARIMA model and LSTM model Forecasting of particulate matter with a hybrid ARIMA model based on wavelet transformation and seasonal adjustment Prediction of application system response time based on ARIMA-LSTM combination model Change and forecast analysis of net value of fund unit based on H-P filter method --Take E Fund for National Defense and Military Industry Mixed Fund as an example A Novel Fuzzy Linear Regression Sliding Window GARCH Model for Time-Series Forecasting Selective health indicator for bearings ensemble remaining useful life prediction with genetic algorithm and Weibull proportional hazards model