key: cord-0136123-wcy5v8ll authors: Zhan, Tianxiang; Xiao, Fuyuan title: A novel weighted approach for time series forecasting based on visibility graph date: 2021-03-14 journal: nan DOI: nan sha: 6ec64f8279e7286b041dd91ecc82e288d0e2e6bf doc_id: 136123 cord_uid: wcy5v8ll Time series have attracted widespread attention in many fields today. Based on the analysis of complex networks and visibility graph theory, a new time series forecasting method is proposed. In time series analysis, visibility graph theory transforms time series data into a network model. In the network model, the node similarity index is an important factor. On the basis of directly using the node prediction method with the largest similarity, the node similarity index is used as the weight coefficient to optimize the prediction algorithm. Compared with the single-node sampling node prediction algorithm, the multi-node sampling prediction algorithm can provide more accurate prediction values when the data set is sufficient. According to results of experiments on four real-world representative datasets, the method has more accurate forecasting ability and can provide more accurate forecasts in the field of time series and actual scenes. ries forecasting method. 14 Time series forecasting can also provide a high reference value to prevent hazards, risk forecasts, cost forecasts and so on. 27 Complex network is a hot research topic recently and has influence widely. 5, 15, 16 Challenging problems in studies of complex network include but not limit to influential spreaders identification, 18, 32, 40 link prediction, 11 vulnerability analysis, 33 decision making, 13, 38 network evolution, 31 game theory. 1, 30, 34 The information from the network inheritance can help us better understand the properties of the time series and make better predictions. So there are many researches focus on how to model the time series in the form of network. 25, 41 The visibility graph algorithm is proposed by Lacasa et al., which converts a time series into a complex network. 12 Application of visibility graph is wide. 22, 35 Visibility graphs (VG) transform time series into complex networks. Visibility graph can also be applied in many fields and applications, like image processsing, 9 decision making 23 and so on. Using visibility graph theory, Donner and Donge discovered potential pitfalls through geophysical time series with VG. 6 Some time series prediction methods based on visibility gragh are presented. 17, 20, 21, 28, 39 For example, Zhang et al. propose to use the most similar routine found by superposed random walk algorithm (SRW) 19 as a reference to predict the time series. 37 Mao and Xiao has improved on Zhang et al.'s algorithm. 24 In the past, the prediction result of the time prediction method based on the visibility graph depends on only one time node in the end. When the time series becomes larger, since the forecast result depends on only one time node in the end, more errors will be generated. This paper proposes a new time series forecasting method based on visibility graph, which reduces the error of time series forecasting by weighting the time nodes in the time series. In this paper, there are some fundamental theories in Section 2. In Section 3, the paper provide the explanation of the proposed method. In Section 4, the meaning of the selected data set and experimental results are introduced. The comparison between the experimental results and the actual values reflects the accuracy of the proposed method. In Section 5, it draws a conclusion of the paper. This section will introduce some preliminaries, including VG and link prediction. A time series can describe changes in data over time in terms of a tuple of time nodes and corresponding data. Time series U is as follows: Fig. 1 : Description of the visibility graph: 12 First display the time series with 6 data in the graph, the scale on the horizontal axis is time, and the vertical height is the data value at the corresponding time node. Then, the visibility 12 between the two nodes is calculated through the visibility algorithm 12 to generate edges and convert them into a visibility graph. 12 (The edges are represented by solid lines in the figure) where y m is the actual value of the data, and t m represents the node in time. In the visibility graph, 12 an element (t m , y m ) is defined as a node, represented by node m. Two nodes satisfying the visibility algorithm 12 are defined as edges. The visibility algorithm 12 is based on the visible theory which is a conversion algorithm from a time series into a visibility graph. 12 In a complex network model, the relationship between nodes needs to be considered. So the nodes are analyzed in pairs. Two time series data (t 1 , y 1 ) and (t 2 , y 2 ) are visible in the network to generate an edge, and any data between them (t 3 , y 3 ) is required to satisfy: y c < y b + (y a − y b ) t b − t c t b − t a(2) Link prediction is a method of exploring links which are unknown in the network. 19 The high similarity index of two nodes reflects that the two nodes have a high possibility of linking. Predicting potential links between nodes by node similarity is efficient method to predict future data. Random walk could measure the neighboring similarity of from the effects of the random walk method schematic. 19 The comparability of adjacency defines the pedestrians in the network. The probability transformation matrix P is given to demonstrate the probe of a random walker leaving an upper reach, assuming a network with N subsystems.The elements in P, namely P (x,y) , are calculated by the equation: where x represents the departure node, and y represents the arrival node. If there is an edge between node x and node y, then a (x,y) = 1, otherwise, a (x,y) = 0. k x is the node degree of node x. Random walkers will walk randomly in a network model. There is a feasibility that a walker departs from each node. The probability of leaving x is represented by π x . Choose t as the step index. t needs to be large sufficientque to make sure that walker can walk around the network completely. π x (t) should fullfill the following conditions: The size of π x (0) is N × 1. And π x (0) instructs the original status of the node x. In the original vector π x (0), the element value of x is equal to 1 and the other elements are 0. The similarity between x and y upstream is determined as follows in each step 19 : where |E| is the number of edges in the network . π (x,y) (t) in π x (t) is the y-th part. And superposed random walk (SRW) denotes the resemblance, determined as follows, based on local random walk (LRW): 19 The proposed method is introduced in this section and it involves five steps: Step 1 (Converting the time series to a complex network model) :An existing time series T = {(t 1 , y 1 ) , (t 2 , y 2 ) , ..., (t n , y n )} generates a corresponding visibility graph through the visibility algorithm. Visibility algorithms can convert time series into visibility graph. 12 For any node in the time series, if the node is visible, they can view other data, and on this basis, use some calculations to determine the visibility of each pair of nodes. By judging the visibility of each pair of nodes, a complex network model is constructed. Step 2 (Calculate the similarity of nodes): Firstly, to measure the similarity of any two nodes, use the local random walk. Then, in order to sum the results of LRW, SRW achieves a higher degree of similarity. Step 3 (Calculate the weighted similarity): Weighted similarity(WS) represents the unit vector based on the similarity of the SRW, which is calculated as follows: 19 WS represents the unitized value of the SRW formation vector, which is calculated as follows: Step 4 (Calculate preliminary prediction vectors): Firstly, consider the i-th of (N-1) nodes. According to Zhang et al. research, 37 for the predicted value of a single sample, the value of the prediction component generated by the i-th node is defined as follows, shown in Fig.2 : By aggregating the predicted values of single node sampling into a preliminary prediction vector, the predicted value vector Y P is as follows: Step 5 (Generate prediction results based on weights): Taking the modulus length of each component of S WS x (t n ) as a weighting coefficient, the 9LVLELOLW\*UDSK([DPSOH weighted sum of the component sizes of the prediction vector Y P is the prediction result of this method, which is expressed as the dot product of S WS x (t n ) and the prediction vector Y P , which is defined as follows: The algorithm flow chart is as follows, shown in Fig.3 : Various time series such as financial data, social data, and medical data is predicted in this section. The proposed method will be compared with past time series forecasting methods based on visibility graph. The following five error tests are used: mean absolute difference (MAD), mean absolute percentage error (MAPE), root mean square error (RMSE), and normalized root mean where the expected value isŷ(t), y(t) is the real value, and N is the cumulative amountŷ(t). In order to verify that the precision of the proposed method increases with the size of the data set, it is possible to pick the data set in turn, to successively increase the size of the data set until the whole data set is complete, and to draw the curve of the relationship between the error and the size of the data set. The number of students at the University of Alabama is used as a small data collection in this experiment as the experimental object of the approach proposed. 3 As a general data collection, the registry data from 1971 to 1992 is chosen. The first and second values of the data would be the basic values of the 1973 prediction of the basic data for registration. Predict the number of registrations in turn. From 1973 to 1992, this experiment completed the data prediction. Fig.4 is an intuitive diagram of predicted data and actual data. It can be found that as the sampling nodes increase, the fitted curve gets closer to the true value. Both the actual enrollment number and the predicted enrollment number of the proposed method have experienced almost the same changing trend. Tab.1 lists the experimental error of the method and the experimental error of the comparison method. Obviously, the accuracy is improved compared with Zhang et al. 's method 37 and Gangwar and Kumar's method, 10 but it is not as good as Mao and Xiao's method 24 The Engineering Cost Record (ENR) publishes the Construction Cost Index (CCI) once a month. 8 The CCI data of the construction industry is worthy of reference data, and many scholars in the construction industry have conducted research. A total of 295 CCI data values (CCI data sets from January 1990 to July 2014) are used for forecasting time series. The CCI data is generally on the rise, but there are twists and turns in the details, as shown in Fig.9 . Under the condition of numerical jitter, the fitting effect of this method can be reflected from the figure, that is, the predicted data is not much different from the actual CCI data, and the prediction performance of this method is high. At the same time, the simple moving average model (SMA) 7 is selected here, and the single-node sampling prediction method of Zhang et al. 37 There are a total of 1000 TAIEX data from May 8, 2015 to June 3, 2017, which means that the generated viewable view will have 1000 nodes. Fig.15 shows the real data (black curve) and predicted data (red curve) of TAIEX. Tab.3 lists the prediction errors of the proposed method, Zhang et al. 37 and Mao and Xiao's method. 24 Fig. 25 shows the error of the above prediction method by means of a bar graph. The method measures MAD, RMSE, MAPE, NRMSE and SMAPE. SMA 7 can withdraw the advantages of SMA 7 under large data sets. As the data set size increases, the prediction accuracy of SMA 7 will gradually increase as the error measurement depends on the entire data set, which can be obtained in small data sets from the SMA 7 method. It can be seen in the performance and can be obtained in Section 4.1 and Section 4.2. In large data sets, the proposed methods are more advantageous. After comparing the prediction effect in large data set, the prediction effect of Zhang et al. 37 and Mao and Xiao method 24 is not as good as SMA, 7 and the prediction error is large, which is the opposite of small data set. However, the proposed method will continue to improve the accuracy as the data set increases, and information sources increase. The final proposed method is better than SMA 7 under all five error standards. In January 2020, the new coronavirus spread across the world. Every day, the health departments of various countries will announce the epidemic situation of the previous day (including the number of confirmed cases, the increase of the number of confirmed cases, the number of cured people, etc). By analyzing and predicting the number of confirmed cases as a time series by the state or medical workers, the forecast data can be known before the real data is re- In this section, the number of confirmed cases of new coronavirus in the U.S. from 28 January 2020 to 3 November 2020 will be used for estimation as a data set (the data set scale is marginally greater than the CCI in Section 4.2). 29 Fig.16 shows that the diagnosis is always on an upward trend. The black curve is the actual data, and the red curve is the predicted data. The prediction curve is similar to the actual curve, which means that the prediction method's accuracy is high. Tab.4 shows the prediction error values of this method and the SMA, 7 Zhang et al., 37 Mao and Xiao's method. 24 SMA, 7 the Zhang et al. 37 method, is considerably less accurate than this method. Compared with the process of Mao and Xiao, 24 the proposed method effect is stronger. Fig. 11 Based on the above experimental data and experimental errors of three different types of data sets, the following points are discussed: 1. (Enrollment Forecasting) The number of students at the University of Alabama is a data set with only 22 data. In the case of a small number of samples, comparing Zhang et al. 37 and Mao and Xiao's method (single-node sampling prediction), 24 the accuracy of the proposed method (multi-node sampling prediction) is basically the same, even under this data set, the In the CCI data set, compared with Mao and Xiao's method, 24 RMSE decreased by 0.62, and NRMSE decreased by 0.0119. The same is true of the data set of the number of individuals hospitalized in the United States with new coronary pneumonia, and the errors are minimized to different degrees. In the United States, the growth rate of the number of individuals living with new coronary pneumonia is 10 6 . Under the condition of huge increase in data, the predicted value of SMA 7 has a large deviation from the true value. In view-based methods such as Zhang et al., 24 Mao and Xiao 24 and the proposed method, the proposed method has the smallest error and is worth using. The data set TAIEX is a data set with a larger number of confirmed cases than Enrollment, CCI, and the United States. In many real scenarios, the basic data set is even larger, so the TAIEX experiment can reflect the advantages of a prediction algorithm in a deeper way. On the other hand, the trend of TAIEX is uncertain, with a certain periodicity, and the frequency of data jitter is high. Under this condition, the proposed method is the smallest among all error indicators, and the experimental error is the smallest shown in Tab.4. Compared with other methods, the proposed method has made great progress. In different contexts, it can provide more detailed prediction outcomes for models with large volumes of data, and facilitate the industry's growth. 4. (Time Complexity Comparison) In terms of time complexity, the time complexity of the proposed method is the same as that of Zhang et al. 37 and Mao and Xiao's methods, 24 37 and Mao and Xiao's method. 24 The difference between the time series forecasting method based on complex network and the classical mathematical statistics method is that the characteristics of complex network using network can be further improved through the connection between different time nodes. Zhang et al. 37 and Mao and Xiao's method 24 use the similarity between time nodes to select the most similar time node linear fitting. When the most similar time node is selected, the predicted result is determined, and the predicted result depends on the most similar node. The most similar nodes are limited by different time series structures. The most similar time nodes are the most similar in the sense of the network, not the most accurate node based on this node of linear fitting. The proposed method solves this problem by weighting the similarity of the nodes. The nodes with high similarity contribute more to the prediction result and are more robust. Under the condition of a small number of nodes, the proposed method will give relatively high weight to nodes with low similarity during random walk, which will lead to relatively inaccurate prediction results. Recently, time series prediction in the framework under complex network is paid great attension. In this paper, a new time series prediction method is presented. The similarity generated after random walk is a key factor for prediction in the network. A new weighted node similairty is constructed. The results show the efficiency of the proposed method. In future work, the proposed method will continue to be improved. Can Environmental Manipulation Help Suppress Cancer? Non-Linear Competition Among Tumor Cells in Periodically Changing Conditions Dual hesitant fuzzy set-based intuitionistic fuzzy time series forecasting Forecasting enrollments based on fuzzy time series. Fuzzy sets and systems Long-term predictive value interval with the fuzzy time series Self-organized interdependence among populations promotes cooperation by means of coevolution Visibility graph analysis of geophysical time series: Potentials and possible pitfalls A two-factor autoregressive moving average model based on fuzzy fluctuation logical relationships Time series models for forecasting construction costs using time series indexes Visibility graphs for image processing A computational method of forecasting based on intuitionistic fuzzy sets and fuzzy time series Link propagation: A fast semisupervised learning algorithm for link prediction From time series to complex networks: The visibility graph Multi-level information fusion to alleviate network congestion Dynamic similar sub-series selection method for time series forecasting The effect of multigame on cooperation in spatial network Popularity enhances the interdependent network reciprocity A fast algorithm for network forecasting time series GMM: A Generalized Mechanics Model for Identifying the Importance of Nodes in Complex Networks. Knowledge-Based Systems Link prediction based on local random walk Visibility graph network analysis of gold price time series Horizontal visibility graphs: Exact results for random time series Analytical properties of horizontal visibility graphs in the feigenbaum scenario Alternatives selection for produced water management: A network-based methodology Time series forecasting based on complex network analysis Multivariate time series link prediction for evolving heterogeneous network A new bandwidth interval based forecasting method for enrollments using fuzzy time series Identification and prediction of bifurcation tipping points using complex networks based on quasi-isometric mapping Vector visibility graph from multivariate time series: a new method for characterizing nonlinear dynamic behavior in two-phase flow Coronavirus pandemic (covid-19) Predator Dormancy is a Stable Adaptive Strategy due to Parrondo's Paradox Progressive information polarization in a complex-network entropic social dynamics model Vital spreaders identification in complex networks with multilocal dimension. Knowledge-Based Systems Evaluating topological vulnerability based on fuzzy fractal dimension Passive network evolution promotes group welfare in complex networks Online visibility graphs: Encoding visibility in a binary search tree Weighted fuzzy time series models for taiex forecasting A novel method for forecasting time series based on fuzzy logic and visibility graph Complex network modeling of evidence theory An efficient network method for time series forecasting based on the DC algorithm and visibility relation A novel model to identify the influential nodes: Evidence Theory Centrality Complex network approaches to nonlinear time series analysis The authors greatly appreciate the reviewers' suggestions and the editor's encouragement. This research is supported by the National Natural Science Foundation of China (No.62003280).