key: cord-0680656-pid2pgg9 authors: Peng, Hao; Chen, Pei; Liu, Rui; Chen, Luonan title: Spatiotemporal convolutional network for time-series prediction and causal inference date: 2021-07-03 journal: nan DOI: nan sha: d3eef5d1b1f7daec308a7afc8258d2d9b5469757 doc_id: 680656 cord_uid: pid2pgg9 Making predictions in a robust way is not easy for nonlinear systems. In this work, a neural network computing framework, i.e., a spatiotemporal convolutional network (STCN), was developed to efficiently and accurately render a multistep-ahead prediction of a time series by employing a spatial-temporal information (STI) transformation. The STCN combines the advantages of both the temporal convolutional network (TCN) and the STI equation, which maps the high-dimensional/spatial data to the future temporal values of a target variable, thus naturally providing the prediction of the target variable. From the observed variables, the STCN also infers the causal factors of the target variable in the sense of Granger causality, which are in turn selected as effective spatial information to improve the prediction robustness. The STCN was successfully applied to both benchmark systems and real-world datasets, all of which show superior and robust performance in multistep-ahead prediction, even when the data were perturbed by noise. From both theoretical and computational viewpoints, the STCN has great potential in practical applications in artificial intelligence (AI) or machine learning fields as a model-free method based only on the observed data, and also opens a new way to explore the observed high-dimensional data in a dynamical manner for machine learning. superior and robust performance in multistep-ahead prediction, even when the data were perturbed by noise. From both theoretical and computational viewpoints, the STCN has great potential in practical applications in artificial intelligence (AI) or machine learning fields as a model-free method based only on the observed data, and also opens a new way to explore the observed high-dimensional data in a dynamical manner for machine learning. It is a challenging task to render multistep-ahead predictions of a nonlinear dynamical system based on time-series data due to its complicated nonlinearity and insufficient information regarding future dynamics. Although many methods, including statistical regression (e.g., autoregressive integrated moving average (ARIMA) [1] , robust regression [2] ), exponential smoothing [3, 4] , and machine learning (e.g., long-short-term-memory (LSTM) network) [5, 6] , have been applied to the issue of predictability [7] [8] [9] [10] , most of them cannot make satisfactory predictions regarding short-term time series due to insufficient information. To solve this problem, the auto-reservoir neural network (ARNN) [11] ) was developed by using the semilinearized spatial-temporal information (STI) transformation equation [11, 12] , which transforms high-dimensional information into temporal dynamics of any target variable, thus effectively extending the data size. However, this approach does not fully explore the nonlinearity of the STI equation from the observed data, which is essential for accurately predicting many complex systems. In addition, few existing approaches take spatial and temporal causal interactions of high-dimensional time-series data into consideration, which can compensate for insufficient data and provide reliable information to predict a complex dynamical system. By assuming that the steady state of a high-dimensional dynamical system is contained in a low-dimensional manifold, which is actually satisfied for most real-world systems, the STI transformation equation [10, 12, 13] has theoretically been derived from delay embedding theory [14, 15] . This equation can transform the spatial information of high-dimensional data into the temporal information of any target variable, thus equivalently expanding the sample size. Based on the STI transformation, the randomly distributed embedding (RDE) framework has been developed for one-step-ahead prediction from short-term high-dimensional time series by separately constructing multiple STI maps (or primary STI equations) to form the distribution of the predicted values [12] . Our recent auto-reservoir computing framework ARNN [11] achieves multistep-ahead prediction based on a semi-linearized STI transformation; however, the nonlinear features and causal relations of the observed high-dimensional variables have not yet been well exploited, which can further improve the prediction robustness and accuracy. On the other hand, a temporal convolutional network (TCN) [16] was recently reported to outperform canonical recurrent networks, such as the LSTM network [5, 6] and the gated recurrent unit (GRU) [17] , across a diverse range of sequence modeling tasks and datasets. Causal convolution, which only operates on the information before the current component, is used in the TCN to ensure no leakage from the future into the past. The TCN also employs dilated convolution, which enables an exponentially large receptive field, to handle long sequences. Other advantages of the TCN are demonstrated in [16] , including a longer effective memory length, parallelism, a flexible receptive field size, stable gradients, a low memory requirement for training, and variable length inputs. Moreover, the TCN has proven to be a promising substitute in multiple canonical applications and fields for recurrent neural networks (RNNs) [18] [19] [20] . However, the generic TCN architecture cannot explore the spatial-temporal dynamics among high-dimensional variables, and the causal relations of the variables need to be further probed. In this study, we propose a novel convolutional network, i.e., the spatiotemporal convolutional network (STCN), to achieve accurate and robust multistep-ahead prediction with high-dimensional data. The central idea is to represent both primary and conjugate STI equations in an autoencoder form of the TCN (Fig. 1 ) by exploiting the advantages of the TCN causal convolution and STI nonlinear transformation. Computationally, the STCN includes three basic processes: (1) the embedding scheme to reconstruct the phase space ( Fig. 1a) , (2) the STCN to realize the STI transformation (Fig. 1b, c) , and (3) effective variable selection to make the prediction accurate and robust (Fig. 1d ). In particular, we adopt both the primary and the conjugate forms of the STI equations to encode (through nonlinear function ) and decode (through the reverse function −1 ) the temporal dynamics from the high-dimensional data (see which are in turn selected as the effective/spatial variables to significantly improve the prediction robustness and accuracy of the target variable. To validate the accuracy and robustness, the STCN was applied to a series of representative mathematical models, i.e., a 90-dimensional coupled Lorenz system [21] under different noise conditions. Furthermore, the STCN was applied to many real-world datasets in this study and predicted, e.g., (i) the daily number of cardiovascular inpatients in the major hospitals of Hong Kong [22, 23] , (ii) the wind speed and solar irradiance in Japan [24] , (iii) a ground meteorological dataset in the Houston, Galveston, and Brazoria areas [25] , (iv) the population of the plankton community isolated from the Baltic Sea [26, 27] , (v) the spread of COVID-19 in the Kanto region of Japan [28] , and (vi) the traffic speed of multiple locations in Los Angeles [29] . The results show that the STCN achieves multistep-ahead prediction that is better than the other seven existing methods in terms of accuracy and robustness. As a modelfree method based only on the observed data, the STCN framework paves a new way to make multistep-ahead predictions by incorporating the primary-conjugate STI equations into an autoencoder TCN form. This framework exploits both the STI transformation and TCN causal structure, thus is of great potential for practical applications in many scientific and engineering fields, and also opens a new way to dynamically explore high-dimensional information in machine learning. We first describe the primary and conjugate STI equations before constructing the STCN (see Methods). For each observed high-dimensional/spatial state = ( 1 , 2 , … , )′ with variables and = 1,2, … , , we constructed a corresponding delayed/temporal vector = ( , +1 , … , + −1 )′ for one target variable to be predicted (e.g., = ) by a delay embedding strategy with as the embedding dimension satisfying > > 1 (Fig. 1a) , where symbol " ′ " is the transpose of a vector. Specifically, through the delay embedding scheme, the matrix of the original measurable variables { 1 , 2 , … , } and matrix of the target variable = are as follows: where which actually forms the following STI equation set: where the first formula is the primary equation with : ℝ ×( +1) → ℝ and the second formula is the conjugate equation with −1 : ℝ → ℝ ×( +1) (Fig. 1a and 1b) . Note that given (2)) hold when some generic conditions are satisfied based on the delay embedding theorem [14, 15] even if the system is high-dimensional and nonlinear. Clearly, the properly determined function is the key to solving the STCN-based STI equations (Eq. (2) (2)). This structure is capable of exploiting not only the input of spatial information but also the temporally intertwined information among the massive variables of the complex dynamic system, thus greatly enhancing the prediction robustness and accuracy. In this study, each layer of the encoder and decoder −1 is followed by the ReLU activation function. On the one hand, the STCN is trained through a self-supervised training scheme, i.e., the "consistently self-constrained scheme" for preserving the time consistency of . Specifically, there is the following loss function: where ℒ is a determined-state loss from the observed/known states { 1 , 2 , … , } of the target variable , ℒ is a future-consistency loss in terms of the future/unknown series On the other hand, the causal inference and effective variable selection of the STCN are realized through the Granger causality calculation; that is, by comparing the prediction errors between the case "with an observable " and the case "without ", we obtain the causal relation of each on the target variable . The details of the causal inference with effective variable selection are also provided in the Methods section. To illustrate the mechanism and the basic idea of the STCN framework, a 90-dimensional coupled Lorenz model [21] ̇( ) = ( ( ); ) Fig. S4. (i) The ground truth network of the six variables. Noise-free situation: First, by applying the STCN to a noise-free situation, a series of predictions are presented in Fig. 2 , including the cross-wings cases (Fig. 2d) , i.e., the known and to-be-predicted series distributed in two wings of the attractor, and the simpler case (Fig. 2c ), i.e., the known and to-be-predicted series distributed in a single wing of the attractor. For each three-dimensional case ( Fig. 2a and 2b) , there were three target variables, 1 , 2 and 3 , each of which was randomly selected from { 1 , 2 , … , 90 }. In one prediction, we used the 90dimensional data from the initial 50 steps as known information/input, and the STCN outputs 15-step-ahead data for the target variables, i.e., = 90, = 50, and − 1 = 15. Notably, the predicted values (the red curves) for each target variable were obtained by the one-time prediction; that is, the STCN provides an efficient way to obtain a whole horizon (15 steps) of future information. Clearly, on the basis of the 90-dimensional short-term time series, the STCN inferred the top 30 effective/causal variables of the targets and significantly increased the performance in both accuracy and robustness by applying the prediction of the target with these 30 variables (Fig. 2c and 2d) . Note that the training and prediction of the STCN are based only on the observed data. Here and below, to validate the effectiveness of the STCN (Eq. (2)), its prediction performance was compared with seven representative methods, i.e., the LSTM network [5, 6] , Holt's exponential smoothing (HES) [3, 4] , autoregression (AR) [30] , autoregressive integrated moving average (ARIMA) [31] , radial basis function network (RBFN) [32] , multiview embedding (MVE) [33] , and support vector regression (SVR) [34, 35] . Additionally, from Table 1 that the STCN performs better than other prediction methods on the noise-free cases of the 90-dimensional Lorenz system; that is, the accuracy of the STCN is the best in terms of the root mean square error (RMSE) and the Pearson correlation coefficient (PCC). Moreover, the performance of eight prediction methods on the datasets without causality variable selection are shown in Supplementary Table S1 . Additive noise situation: Second, the STCN was applied to the noisy cases of the 90D Lorenz system Eq. (4) with additive white noise ( = 0.5) to predict the same target variable, while = 50, and − 1 = 15. Specifically, the cross-wings case is exhibited in Fig. 2f , and the single-wing case is presented in Fig. 2e . After the selection of the top 30 effective/causal variables, the prediction accuracy of the STCN improves significantly and is better than that of the other seven methods for both the single-wing and cross-wings cases. Therefore, although the prediction performance slightly deteriorates compared with the noise-free situation (Fig. 2c and d), the STCN still captures the dynamics efficiently and is much more robust when the system is perturbed by noise. The STCN achieves satisfactory performance even with noisy data compared with traditional approaches because of its two characteristics, that is, simultaneously solving both conjugated STI equations in Eq. (2), and effective variable selection among all observables. In the era of big data, high-dimensional data are ubiquitous in many fields. Predicting the future values of key variables by exploiting the relevant high-dimensional information is of great importance for studying complex systems forecasting potential risk. The STCN method was applied to the following various high-dimensional real-world datasets, and was also compared with seven existing methods. The detailed performances of all the prediction methods are exhibited in Table 1 . The specific parameters or variables against the known data in each dataset are summarized in Supplementary Table S2 . The description of the datasets is given in Supplementary Note 5. The first real-world dataset contains the number series of cardiovascular inpatients in major hospitals in Hong Kong and the indices series of air pollutants, i.e., the daily concentrations of nitrogen dioxide (NO2), sulfur dioxide (SO2), ozone (O3), respirable suspended particulate (Rspar), mean daily temperature, relative humidity, etc., which were obtained from air monitoring stations in Hong Kong from 1994 to 1997 [22] . According to the high correlation between cardiovascular inpatients and air pollutants [36] , the STCN was applied to forecast daily cardiovascular disease admissions based on a set of air pollutant indices (Fig. 3) . Considering the delay effect of every potential factor as well as a dummy vector of the weekday effect [36] , we have a 14-dimensional system ( = 14), with known time points being set as = 70 (days) and the prediction horizon set as − 1 = 25 (days). By inferring and selecting the top 11 effective variables, the prediction accuracy of the STCN increases significantly and is better than that of the other methods. The causal relations among cardiovascular inpatients and air pollutants are inferred and provided in Fig. 3i . The STCN was then applied to a dataset collected in a long-term experiment with a marine plankton community isolated from the Baltic Sea from 1990 to 1997 [26, 27, 37] , including the species abundance time series of bacteria, several phytoplankton species, herbivorous and predatory zooplankton species, and detritivores. These plankton species constructed a food web, which was cultured in a laboratory mesocosm and sampled twice a week for more than 2,300 days. As shown in Fig. 3e-3h , the STCN predicts the dynamic trend of the abundances of two target species (cyclopoids and rotifers), with parameter settings = 12 (total 12 plankton species), = 18 (the known abundance information of 18 steps), and − 1 = 6 (6 step-ahead prediction). By selecting the top 8 effective variables, the STCN achieves a higher prediction accuracy, i.e., RMSE=0.542 and PCC=0.879 for cyclopoids and RMSE=0.553 and PCC=0.953 for rotifers, than other methods. In addition, the causal/food chain network among four species, i.e., rotifers, cyclopoids, pico cyanobacteria, and protozoa, was inferred by the STCN (Fig. 3j) . Wind speed is one of the weather variables with highly time-varying characteristics in nonlinear meteorological systems and is thus extremely difficult to predict. The wind speed dataset was collected from the Japan Methodological Agency [24] . Among the 155 wind stations distributed all around Japan, we selected one target station near Tokyo. As shown in Fig. 4 , the STCN predicted the dynamics of the wind speed in the target station with parameter settings = 155, = 64, and − 1 = 26 ( Fig. 4a and 4c) . After inferring and selecting the 70 most effective variables, the prediction accuracy of the STCN increases significantly, as shown by the comparisons in Fig. 4b and 4d . Based on the effective variables, the predictions of the STCN are better than those of the other methods. Long-term predictions were also performed by selecting 70 top effective variables and are provided in Fig. 4e and 4f , from which the wind speed in the target station was continuously predicted for a whole season (3 months). The predictions for more periods are provided in Fig. S5 . The STCN was applied to the prediction of the traffic speed (mile/h), which was based on a dataset collected from = 207 loop detectors in the 134-highway of Los Angeles County. The traffic speed was recorded every five minutes from Mar 1 st , 2012 to Jun 30 th , 2012 [29] . In such a dynamic system, each detector was regarded as a variable by which the traffic speed detected was mainly determined by the observed values from the nearest neighbor sensors. Consequently, 55 nearest-neighbor detectors of the target detector were selected to constitute a subsystem. By applying the STCN, the multistep predictions ( − 1 = 19 time points ahead) of four target locations/sensors were obtained based on the neighbor 55 variables ( = 55, Fig. 5a , 5c, 5e, and 5g) and top 30 effective variables ( = 30, Fig. 5b, 5d , 5f, and 5h) with = 60 time points. Based on the effective variables, the predictions of the STCN are better than those of the other methods. Supplementary Movie S1 shows the dynamic change in the predicted traffic speed. As a highly infectious disease, the reproduction number of COVID-19 was estimated to be as high as 6.47 in the early stage [38] . Many studies have suggested that early interventions, such as the use of masks, social distancing, self-isolation, quarantine, and even lockdown of entire regions and communities, are effective in containing or at least mitigating the spread of the virus [39] . It is thus crucial to predict the spread of COVID-19 so that a timely public health strategy can be carried out to reduce the magnitude and spread of the pandemic. However, the complex characteristics of both biological and social systems lead to the challenge of achieving the real-time prediction of infectious disease outbreaks. The STCN provides a data-driven approach to predict the dynamic change in new cases. As shown in Fig. 6 , the STCN predicted the number of COVID patients in several cities with severe epidemics in Japan [28] The last real-world dataset contains 72-dimensional ground meteorological data ( = 72) recorded per month in the Houston, Galveston, and Brazoria areas [25] from 1998 to 2004. As shown in Fig. S2 , the relative humidity and geopotential height were accurately predicted. For each target index, the STCN was applied to make a 17-step-ahead prediction ( − 1 = 17) based on the former = 50 steps of the 72-dimensional data. The results predicted by the STCN are better than those predicted by the other methods (Table 1 ). In this study, we proposed the STCN framework to make multistep-ahead predictions with causal factor inference based on high-dimensional data in an accurate and robust manner. The delay embedding theorem ensures that each spatiotemporal matrix [ − , − +1 , … , ] and each temporal vector correspond to each other via a smooth map, and thus, we have the primary and conjugate STI equations (Eq. (2) or (10)) [14, 15] . That is, the primary STI equation is an encoder to transform the spatiotemporal information of a high-dimensional matrix [40, 41] . In summary, compared with traditional prediction methods, the STCN possesses the following advantages. First, the STCN achieves multistep-ahead prediction with highdimensional data due to the STI nonlinear transformation from high-dimensional spatiotemporal information into temporal information. Second, in practical applications, by simultaneously solving a conjugated pair of STI equations corresponding to a spatiotemporal convolutional autoencoder, the STCN is robust and performs well in multiple datasets, including noise-perturbed cases, which widely exist in real-world systems. In addition, the STCN has a solid theoretical background based on the STI equation and with the TCN causal convolution structure, and opens a new way to explore the observed high-dimensional information in a dynamical manner for machine learning. The results for the applications to a variety of real-world problems demonstrated the effectiveness and efficiency of our method. Therefore, the STCN opens a new way for short-term prediction in terms of the computational efficiency, accuracy and robustness, which is of high potential in real-world applications as a model-free method based only on the observed data. STCN: spatiotemporal convolutional network; LSTM: long short-term memory; AR: auto regression; RMSE: root-mean-squire error; PCC: Pearson correlation coefficient; STI: spatialtemporal information. Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma Robust regression and outlier detection Forecasting seasonals and trends by exponentially weighted moving averages Exponential smoothing for predicting demand Transductive LSTM for time-series prediction: An application to weather forecasting Long short-term memory Recurrent neural networks and robust time series prediction Time series analysis. The Oxford Handbook of Data based identification and prediction of nonlinear and complex dynamical systems Time series prediction: forecasting the future and understanding the past. Routledge Autoreservoir computing for multistep ahead prediction based on the spatiotemporal information transformation Randomly distributed embedding making short-term high-dimensional data predictable Nonlinear prediction of chaotic time series Detecting strange attractors in turbulence. Dynamical systems and turbulence An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling Learning phrase representations using RNN encoder-decoder for statistical machine translation A convolutional encoder model for neural machine translation Temporal Convolutional Networks for Action Segmentation and Detection Language modeling with gated convolutional networks A generalized Lorenz system Air pollution and hospital admissions for respiratory and cardiovascular diseases in Hong Kong Statistical estimation in varying coefficient models Predicting ramps by integrating different sorts of information Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond Chaos in a long-term experiment with a plankton community Coupled predator-prey oscillations in a chaotic food web: Coupled predator-prey oscillations COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Bootstrap prediction intervals for autoregression Distribution of residual autocorrelations in autoregressiveintegrated moving average time series models Radial basis function networks 2: new advances in design Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality Support vector machines: theory and applications Calibration of ϵ-insensitive loss in support vector machines regression Semi-parametric estimation of partially linear single-index models A long-term series of a planktonic foodweb: a case of chaotic dynamics. Internationale Vereinigung Für Theoretische Und Angewandte Limnologie Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma We thank the Japan Meteorological Agency which provided the datasets of wind speeds used in this study (available via the Japan Meteorological Business Support Center). This work was Author contributions: LC and RL conceived the research. HP and PC performed the numerical simulation and real data analysis. All authors wrote the paper. All authors read and approved the final manuscript. The authors declare that they have no competing interests.