key: cord-0072173-3m5z19wy
authors: Papana, Angeliki
title: Connectivity Analysis for Multivariate Time Series: Correlation vs. Causality
date: 2021-11-25
journal: Entropy (Basel)
DOI: 10.3390/e23121570
sha: da238be484118a440bab43de9ae8049a27bb637d
doc_id: 72173
cord_uid: 3m5z19wy

The study of the interdependence relationships of the variables of an examined system is of great importance and remains a challenging task. There are two distinct cases of interdependence. In the first case, the variables evolve in synchrony, connections are undirected and the connectivity is examined based on symmetric measures, such as correlation. In the second case, a variable drives another one and they are connected with a causal relationship. Therefore, directed connections entail the determination of the interrelationships based on causality measures. The main open question that arises is the following: can symmetric correlation measures or directional causality measures be applied to infer the connectivity network of an examined system? Using simulations, we demonstrate the performance of different connectivity measures in case of contemporaneous or/and temporal dependencies. Results suggest the sensitivity of correlation measures when temporal dependencies exist in the data. On the other hand, causality measures do not spuriously indicate causal effects when data present only contemporaneous dependencies. Finally, the necessity of introducing effective instantaneous causality measures is highlighted since they are able to handle both contemporaneous and causal effects at the same time. Results based on instantaneous causality measures are promising; however, further investigation is required in order to achieve an overall satisfactory performance.

There are various challenges in the analysis of multivariate high-dimensional systems, such as in the analysis of financial and neurophysiological data. The goal of each application and the features of the examined data should be considered in order to determine the suitable connectivity analysis scheme. For example, financial time series are nonstationary, contain nonlinearities and exhibit volatility clustering, whereas data in neuroscience experiments may present a high temporal resolution, be subject to artifacts, periodic respiratory or cardiac noise.

Connectivity analysis focuses on identifying the interdependence relationships of the variables of a complex system. Connectivity measures can be subdivided into two main categories based on whether they quantify the direction of a relationship. Nondirectional measures assume that variables evolve in synchrony and the connectivity is examined based on symmetric measures, such as dependence measures. On the other hand, directed measures quantify the causal effects among the variables, assuming that causes precede their effects in time, such as Granger causality measures [1] . A further subdivision of both categories differentiates between model-based and model-free connectivity measures. Both categories of measures consist of measures calculated in the time, frequency or phase domain. Spectral measures of dependence infer the dependence between oscillatory components of the examined data.

Since there is an abundance of connectivity measures that have been developed so far, there is also an urgent need to compare them and clarify the usefulness of each method. Comparisons are mainly performed in terms of applications of interest. Indicatively, comparisons of correlation measures can be found in [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] , among synchronization measures for the remaining variables of the system which infers about the directionality of the information flow [85] [86] [87] .

A list of some well known correlation measures in time and frequency domain is displayed in Table 1 . Table 1 . Non-directional connectivity measures.

Pearson product-moment correlation coefficient [43] Spearman rank correlation coefficient [44] Kendall's rank correlation coefficient [45] Hoeffding's test of independence [46] Biweight midcorrelation [88] Coefficient of determination [48] Distance correlation [49, 50] Partial distance correlation [51] Yule's Q [52] Yule's Y [53] CANOVA [9] Randomized Dependence Coefficient [56] Mutual information [65] [66] [67] Nonlinear correlation information entropy [64] Entropy correlation coefficient [68] Entropy coefficient of determination [69] Maximal information coefficient [70] Partial maximal information coefficient [71] Coherence [73] Mean phase coherence [12, 79] Phase locking value [12, 78] Determinism [83, 84] The different correlation measures have been vastly applied in different fields, such as in finance, neurophysiology, meteorology, biology and engineering. Applications include the identification of genomic associations [89, 90] , the examination of the association of proteins in the pathogenesis of Parkinson's disease [91] , noise reduction [92, 93] , identification of disease-specific biomarker genes [94] , multimodal image registration [95] , portfolio optimization [96, 97] , investment decisions [98] , wind power combination prediction [99] , genetic interactions [100] , artificial neural network model development that concerns water treatment plants [101] , testing tourism economies and islands' resilience to the global financial crisis [102] , electroencephalograms (EEG) analysis [103] , the study of financial markets [104, 105] , recognizing multiple positive emotions by analyzing brain activities [106] , identifying meteorological parameters that play a major role in the transmission of infectious diseases such as COVID-19 [107] and stock trend prediction [108, 109] .

Directional connectivity measures seek to infer the direction of the relationship from the data samples, relying on the principle that causes precede their effects. The most common procedure of causal discovery is Granger causality, where probabilistic causation relies on the concept that causes change the probabilities of their effects [1, 110] .

Model-based directional approaches assume the linearity of interactions. The standard linear Granger causality is the pioneer technique based on autoregressive models that seeks to determine whether prediction of the target (driven) variable can be improved by exploiting past values of the source (driving) variable [1] .

Various model-based extensions of the standard Granger causality test have been developed so far. The conditional Granger causality is its multivariate extension that exploits all the available information of the observed data [111] . Partial Granger causality is an extension of conditional Granger causality developed to face the problem of exogenous inputs and latent variables [112] . Further parametric causality methods have been introduced, such as methods defined on radial basis functions [113] , kernel functions [114] and nonlinear autoregressive exogenous models [115] .

Non-parametric extensions of Granger causality to nonlinear cases in the time domain include the Baek and Brok test [116] , the Hiemstra and Jones test [117] and the Diks and Panchenko test [118] , and [119] extend the Hiemstra and Jones test in multivariate settings.

Numerous directional measures stem from information theory. These model-free approaches infer linear but also nonlinear interactions. Transfer entropy is the most wellknown information measure for studying directed interactions [120] . Partial transfer entropy extends the bivariate transfer entropy to the multivariate case, where confounding variables are also considered in the estimations [121, 122] . Some further information causality measures based on the nonuniform embedding scheme are the partial transfer entropy [123] , (partial) mutual information on mixed embedding [124, 125] and transfer entropy based on low-dimensional approximations of conditional mutual information [126, 127] .

Linear cross-correlation is the simplest and most well known synchronization measure defined as the ratio of covariance to root-mean variance of the two signals. Event synchronization is another simple and computationally efficient method that quantifies synchronicity and time delay patterns between signals [128] .

Various nonlinear interdependence measures have been developed in regards of nonlinear prediction theory that use the neighborhoods of the reconstructed points of the state space aiming to determine the nonlinear driver-response relationships [129] [130] [131] [132] [133] [134] . As an extension of the above, the (conditional) extended Granger causality further employs a linear model for all the points in the neighborhood of each reference point of the reconstructed state space [135] . A more recent bivariate causality method based on nonlinear state space reconstruction can be found in [136] . Empirical dynamic modeling (convergent cross mapping) is utilized for the definition of this measure. It has been introduced to inferring causality from complex systems that do not satisfy the separability assumption, i.e., when the cause and the effect are non-separable.

Graphical models have been suggested by [137] to account for probabilistic independence relationships between variables without relying on temporal information. Probabilistic graphical models are a combination of graph theory and the probability theory.

The field of causal discovery was signified by [138, 139] , where causal interpretation of the graphs was succeeded based on Bayesian network models. The PC algorithm is the pioneer structure-learning algorithm for directed graphs [138] under the assumption of the Causal Markov condition. After the introduction of the PC algorithm and the Fast Causal Inference [140] , an ensemble of different causal discovery methods based on graphical models has been developed [141] [142] [143] [144] . Markov discovery algorithms such as the PC algorithm cannot be directly used for the time series. Therefore, ref. [145] adapted the Fast Causal Inference algorithm for time series.

The Peter Clark momentary conditional independence algorithm is a causal discovery method that incorporates linear or nonlinear conditional independence tests to determine the causal networks from multivariate time series data [146] . This measure is designed for climate applications; therefore, it can handle strong interdependencies in the sample. An extension of this measure aiming to improve the computational efficiency is the Fast Approximate Causal Discovery Algorithm [147] .

Data from time domain can be converted to the frequency domains with mathematical operators, such as the Fourier transform, which converts a time function into a sum or integral of sine waves of different frequencies. Causality measures from the frequency domain have been widely applied for the analysis of neurophysiological data. The majority of the developed spectral measures are based on linear models, and thus can only detect linear causal effects in the frequency domain, such as Geweke's spectral Granger causality [111] , the directed transfer function [148] , the partial directed coherence [149] , the direct Directed Transfer Function [150] , the Generalized Partial Directed Coherence [151] , the Phase Slope Index [152] . Recently, a frequency-domain approach for testing for short-and long-run causality has been introduced in [153] .

Nonparametric methods have been also employed, such as the nonparametric approach based on Fourier and wavelet transforms in [154] , the nonparametric partial directed coherence [155] and the DEKF-based extension of partial directed coherence, where the parameters of the time-varying autoregressive model are estimated using the Dual Extended Kalman Filter (DEKF) [156] . Further, the nonlinear partial directed coherence aims to model the nonlinear relationships of the examined time series using nonlinear models and generalized frequency response functions [157] .

Granger causality relationships are examined by considering the past values of the involved variables. However, the prediction of the target variable, may at cases be improved by including the available current information of the source variable. In such a case, the instantaneous causality relation between the source and target variable should be considered [158] . For example, contemporaneous relationships are present if the regression residues of the data are correlated.

Within the framework of stationary autoregressive modeling, the instantaneous causality is usually tested by using Wald tests for zero restrictions on the innovation's covariance matrix. Extended Granger causality accounting for zero-lag effects in the linear regression schemes implemented by the VAR model [159] . Instantaneous causality in presence of non constant unconditional variance is examined in [160] . Instantaneous causality measures defined on structural vector causal models are presented in [161] [162] [163] [164] .

A causality framework in frequency domain that considers instantaneous effects is introduced in [165] . An instantaneous measure of causality which is relying on the information versions of directed transfer entropy and partial directed coherence estimated after decomposing the coherencies and partial coherencies is presented in [166] . Instantaneous Granger causality measures based on the the Hilbert-Huang transform are introduced in [167] .

Compensated transfer entropy is a nonlinear causality measure that regards contemporaneous relationships [168, 169] . A multivariate Granger causality measure including instantaneous variables in the conditional set based on decomposition of conditional directed information is discussed in [170] . Partial mutual information from mixed embedding that considers also zero-lag effects, denoted as PMIME0, faces the problem of determining the connectivity network from multivariate time series in the presence of unobserved variables [171] . Finally, PCMCI+ is a causality measure based on conditional independence tests that searches for causal and contemporaneous parents in order to infer lagged and contemporaneous causal relationships.

A list of well known directional connectivity measures in time and frequency domain is displayed in Table 2 . [171] PCMCI+ [172] The pioneer Granger non-causality test has been developed for analyzing financial data [1] ; however, it is now vastly applied in various fields, such as for the analysis of magnetoencephalography (MEG) and electroencephalography (EEG) data [173, 174] . Granger causality and its extensions, along with the alternative causality measures that have been developed afterwards, are vastly used in different applications. Among others, causality measures are utilized in financial applications, e.g., for the examination of the relation of stock markets [175, 176] , in neuroscience, e.g., for the analysis of brain structures and physiological time series [150, 169, 177] , in seismology, e.g., for the analysis of earthquake data [178] , in geoscience, e.g., for the discovery of weather and vegetation conditions on global wildfire [179] , in meteorology, e.g., for modeling the air quality [180, 181] , and in epidemiology [182, 183] .

The estimation of symmetrical and causal relationships from observational data has been vastly explored, along with the limitation and pitfalls of the corresponding measures [35, 38, [184] [185] [186] [187] [188] [189] [190] . Naturally, depending on the examined application, different additional issues may arise that should be addressed. For example, when analyzing electroencephalogram data, spurious functional connectivity may arise due to the common reference problem, i.e., as a result from the usage of a common reference channel. Therefore, connectivity measures that are sensitive to correlations at a zero time may give erroneous indications depending on the relative strength of the potential fluctuations at the recording and reference locations.

Linear model-based connectivity measures assume linearity of the relationships [191, 192] , whereas outliers can strongly affect them [191, 192] . At cases, relationships can be linearised, by transforming the variables, e.g., by considering a logarithmic transformation. Alternatively, for monotonic nonlinear relations, rank-based measures can be utilized. If these solutions cannot be applied, then nonparametric and nonlinear measures are more appropriate. For example, rank-based measures and information-based measures are robust to outliers.

In general, measures of connectivity are biased, and, therefore, under the null hypothesis of no connectivity the estimates will be different from zero. Accurate estimation of connectivity measures requires sufficient sample sizes [193] . Guidelines for sufficient sample sizes have been presented for different scenarios [194] [195] [196] , whereas solutions for different applications with small samples have been proposed [197] [198] [199] .

Real data may entail various types of noise and noise levels; there are different data measurement methods that may entail measurement errors depending on the application. For example, noise in financial data may stem from small price movements and trading noises that illustrate heavy tails. The effect of noise on correlation measures has been examined in different studies [200] [201] [202] . The effect of noise on Granger causality analysis has been also examined. Due to noise, erroneous causality arises and true causality is suppressed when using the standard linear Granger causality test [36] . The nonlinear causality measures are generally more stable to the effect of noise than the linear ones [26, 125] .

There are various reasons for inferring spurious causal effects, such as due to unobserved variables, contemporaneous relationships, common inputs, synergetic and redundant influences and strong autocorrelations in the sample. Another difficulty in causal inferring is the discrimination between direct and indirect interactions when common inputs exist, although direct causality measures have been developed for this. Robust methods that can account for latent effects of unobserved variables are an open area of investigation in connectivity analysis [203] [204] [205] [206] [207] [208] .

The determination of causal directionality for contemporaneous links is an emerging area. An instantaneous causal effect can be interpreted as a zero-lag causality or as a symmetric causal relationship. However, it has been noted that instantaneous causality may arise in case of common sources and latent, unobserved variables [41, 171] .

A plethora of connectivity measures have been briefly discussed above, along with some main pitfalls and limitations. However, a key question that arises is whether to apply symmetric correlation measures or directional causality measures to infer the connectivity network of an examined system. Since connectivity of real systems is unknown, the nature of the examined data and the performance of the connectivity measures are of great importance. Therefore, we generate synthetic time series with known connectivity structures and demonstrate the efficacy of the connectivity measures in three different scenarios. In particular, we examine the influence of different types of dependencies in the samples to the efficiency of the connectivity measures. First, we consider a simulation system with only contemporaneous dependencies and explore the performance of the connectivity measures and in particular of the causality measures. The second simulation system demonstrates the effect of time-lagged directional relationships on the connectivity measures and in particular on correlation measures that are not defined in order to detect lagged dependencies. Finally, we consider a system with contemporaneous and time-lagged directional relationships and examine the performance of the connectivity measures. To better simulate real data which are usually non-normal, the noise terms of the considered stochastic simulation systems are not exclusively Gaussian, as usually assumed in the literature, but also skewed and non-symmetrical noise terms are regarded.

Based on the equations of each simulation system, 100 realizations with sample size n = 2000 are formed and different connectivity measures are computed. Specifically, we estimate the four correlation measures, four causality measures and two instantaneous causality measures. Let us examine the three-variate case, where known variables are X, Y and Z. The multivariate connectivity measures are similarly defined; however, instead of Z, an ensemble of conditioning variables Z = Z 1 , . . . , Z K exists.

The aim of the study is to provide insights on the effectiveness of the different types of connectivity measures. Therefore, an indicative selection of measures is performed since it is impossible to include the ensemble of existing connectivity measures. The examined measures cover the most commonly used types of connectivity measures.

The considered correlation measures are the following ones:

, cov stands for covariance, and σ X and σ Y are the standard deviations of X and Y. Estimation of PPCor is performed based on "partialcorr" function from the Matlab Statistics Toolbox. • Partial Spearman rank correlation coefficient (PSpCorr), defined similarly to PPCor but on the series of the ranks. Estimation of PSpCorr is performed based on "partialcorr" function from the Matlab Statistics Toolbox. • Partial distance correlation (pdCor) is the extension of the distance correlation (dCor) in the multivariate case. The distance correlation of two random variables is obtained by dividing their distance covariance by the product of the distance standard deviations,

. Partial distance correlation is defined based on a Hilbert space where the squared distance covariance is defined as an inner product [51] . Estimation of pdCor is performed based on R codes given in [209] . Partial transfer entropy on non-uniform embedding (PTENUE) measures the direct effect of Y on X in the presence of the "appropriate" past terms of all the variables w t = {w X t , w Y t , w Z t } : PTENUE Y→X|Z = I(x t+1 ; w Y t |w t ), where x t+1 is the future value of X one step ahead. Matlab codes for the estimation of PTENUE can be found in http://www.lucafaes.net/its.html (accessed on 23 October 2021). • Partial directed coherence (PDC) is based on VAR models as CGCI; however, it is defined in the frequency domain. For a frequency f , it is given as PDC Y→X|Z ( f ) = Finally, two instantaneous causality measures are assumed in this study:

• Partial mutual information on mixed embedding (PMIME0) is an extension of the causality measure PMIME, that also contains zero lag terms. For the estimations, the Matlab code was provided by the authors [171] . • Peter Clark momentary conditional independence algorithm (PCMCI+) addresses both lagged as well as contemporaneous causal discovery. Its an extension of PCMCI, which searches for causal parents based on conditional independence tests. The informationtheoretic framework is considered here where the conditional mutual information is utilized as a general test statistic. Computations are performed using the python codes in https://github.com/jakobrunge/tigramite (accessed on 23 October 2021).

Standard free parameters and significance tests are utilized for each connectivity measure. In particular, the significance test for PPCor and PSpCorr is parametrically extracted based on the test statistic t = r √ n−2 √ 1−r 2 , which follows the Student's t-distribution with n − 2 degrees of freedom. For the estimation of MI, we consider the KNN method where k = 10 neighbors. Regarding its statistical significance, it is assessed by randomly permuting the time series; p-values are then estimated from a one sided-test for the null hypothesis that two variables are independent. The number of permuted time series is set to be equal to 100. The significance of pdCor is also assessed using 100 permutations. As previously stated, the PDC is estimated for a range of frequencies in [0, 0.5] (256 different frequencies). Significance is assessed parametrically as it is defined on VARs. The percentage of significant PDC values for each frequency is then examined. Finally, we display the percentage of significant PDC values over all frequencies and realizations, instead of displaying results for specific frequencies or frequency bands. The order of the VAR model for system 1 is set to be P = 1, for system 2 we set P = 3 and for system 2 we set P = 3.

A parametric significance test is employed for CGCI and RCGCI since these measures are also defined on VARs [153] . Order of VAR is set as noted for PDC above. The PTENUE and PMIME0 incorporate surrogates within their estimation algorithm and no significance test is required; positive values suggest the existence of causal effects, otherwise zero values are obtained. The free parameter L max for the lagged terms is equal to 4 for all systems. The significance level for the test for the termination criterion for PMIME0 is 0.05, whereas for PTENUE it is set to 0.01 For both measures, we set one step ahead that the mixed embedding vector has to explain, we consider 100 surrogates for the significance test and 10 neighbors (KNN estimator). Finally, the majority rule for handling ambiguous triples is assumed and the significance level is 0.05. Finally, local permutation tests are employed within the estimation procedure of PCMCI+ to determine the causal parents.

First, we consider a five-variate stochastic nonlinear simulation system where by construction, data have known contemporaneous dependencies and there are no causal influences. The equations of the system are the following:

where e 1t follows an exponential distribution with rate λ = 2, e 2t follow a chi-squared distribution with 1 degree of freedom, e 3t , e 4t , e 5t follow the Gaussian distribution (mean=zero, standard deviation one) and all noise processes are independent to each other. Based on the system's equations, significant positive linear dependencies exist between the variables X 2 , X 3 , whereas nonlinear ones exist between the variables X 1 , X 4 and X 2 , X 5 (Figure 1a ). Correlation measures are symmetrical; therefore, results are displayed by upper triangular Tables. The PPCor detects the linear correlation between X 2 , X 3 and the nonlinear one between X 2 and X 5 ; however, the complex nonlinear relationship of X 1 and X 4 is detected with a very low percentage over the 100 realizations (Table 3) . PSpCor correctly identifies the connectivity network of the system; however, it also indicates the non-direct association of X 3 and X 5 with a percentage of 100% over the 100 realizations. The pdCor and MI have similar performance with PSpCor, with MI achieving a relatively low percentage of significant correlations for the pair of variables X 1 -X 4 (31%). Since MI is the only bivariate correlation measure considered, it is the only measure expected to indicate the association of X 1 and X 4 . Therefore, although the system is formulated having only contemporaneous dependencies, none of the correlation measures describes the entire connectivity network of the system with complete accuracy. On the other hand, all the direct causality measures correctly suggest that no causal effects exist among the variables of the first simulation system. RCGCI, PTENEUE, PDC and PCMCI achieve low percentages of significant causal links over the 100 realizations (around the nominal level 5%), therefore suggesting that no causal effects exist. Regarding PDC, it gives low percentages of significant effects for all the examined frequencies. No information about contemporaneous relations can be inferred from the causality measures.

Finally, the instantaneous causality measures PMIME0 and PCMCI+ are estimated. Both contemporaneous and lagged effects are extracted and reported in Table 3 for both measures. PMIME0 correctly finds the contemporaneous relationships; however, the percentage of significant relations over the 100 realizations for X 1 -X 4 is relatively low (42%). Due to the estimation procedure of PMIME0, results are approximately symmetrical as long as contemporaneous effects are concerned. Regarding the lagged effects based on PMIME0, percentages of significant links are greater than the nominal level (5%) at most directions, with the highest one achieving 28% for X 1 → X 3 . Such a performance has been observed previously for PMIME, whereas the detection of non-coupled pairs of variables was observed with percentages larger than the considered nominal [26] ; as previously mentioned, PMIME0 is the extension of the causality measure PMIME that infers about both lagged and contemporaneous dependencies. Finally, PCMCI+ correctly identifies the contemporaneous effects with high percentages over the 100 realizations. Further, low percentages for causality are obtained for all pairs of variables. The majority of the estimated percentages are slightly over the nominal level.

In the second example, the causal influences between the variables are known by construction while no contemporaneous influences exist. A nonlinear vector autoregressive (VAR) model of order 3 in five variables is formed. The systems' equations are given below:

where e 1t , e 5t follow the Gaussian distribution (mean = zero, standard deviation=one), e 2t follows an exponential distribution with rate λ = 2, e 3t follows beta distribution with shape parameters a = 1 and b = 2, e 4t follows beta distribution with a = 2 and b = 1 and all noise processes are independent to each other. Based on the system's equations, there are both nonlinear causal influences, i.e., X 1 → X 2 , X 5 → X 4 , and linear ones, i.e., X 1 → X 3 , X 4 → X 5 (Figure 1b) .

In the second simulation system, correlation measures seem to be affected by the temporal dependencies and indicate significant contemporaneous dependencies (Table 4 ). In particular, all the correlation measures indicate the correlated pairs of variables X 1 -X 3 and X 4 -X 5 , whereas pdCor is also suggesting additional correlated pairs of variables (X 1 -X 2 , X 1 -X 4 and X 2 -X 3 ). We notice that the suggested correlated pairs of variables (X 1 -X 3 and X 4 -X 5 ) coincide with the pairs of variables with linear causal links, i.e., X 1 linearly causes X 3 and X 1 linearly causes X 3 .

The linear causality measures CGCI and RCGCI infer correctly the causal relationships, however the nonlinear link X 1 → X 2 achieves a relatively low percentage of significant effects over the 100 realizations (20% and 2024%, respectively). Separability assumption states that there is unique information about the target variable contained in the driving variable. When this assumption is satisfied, such as in case of linear stochastic systems, Granger causality is effective, whereas deterministic dynamical systems commonly do not satisfy the separability condition. Therefore, the ability of CGCI and RCGCI to detect nonlinear causal effects is related to the nature of the examined system and the satisfaction of the separability assumption. The nonlinear causality measure PTENUE also correctly indicates the directional linkages; however, the nonlinear effect X 4 → X 5 is detected with a percentage of 44% over the 100 realizations. PDC has the lowest performance among the causality measures. It detects X 1 → X 3 , X 4 ↔ X 5 and fails to find X 1 → X 2 , whereas the spurious causal effects

Regarding the instantaneous causality measures, PMIME0 does not identify contemporaneous relations, and suggests the correct causal effects. As noted in the first simulation system, the percentage of significant causal effects for the non-causal links may exceed the nominal level; however, the exported percentages are generally lower compared with those obtained for system 1. PCMCI+ performs worse than PMIME0. It suggests only temporal dependencies; however, large percentages of significant causal effects are noted for non-causal links, where the highest erroneous percentages concern X 2 → X 3 (41%) and X 3 → X 1 (32%); the common input variable X 1 possibly confuses PCMCI+.

Finally, we consider a system with temporal and contemporaneous dependencies, i.e., lagged and zero-lag dependencies. The causal influences between the variables are known by construction. The equations of the considered system are the following ones:

where e 1t , e 4t follow the Gaussian distribution (mean = zero, standard deviation = one), e 2t , e 3t follow beta distribution with shape parameters a = 1 and b = 2, e 5t follows gamma distribution with a = 16 (shape) and b = 0.25 (rate) and all noise processes are independent to each other. Based on the system's equations, there is a contemporaneous relationship between X 1 and X 2 , the linear causal influence X 3 → X 4 and the nonlinear causal effects X 2 → X 3 , X 3 → X 5 (Figure 1c ). Correlation measures correctly indicate the contemporaneous relation of X 1 and X 2 , however additional relations are suggested for the pairs of variables with causal relations but also for many non causal pairs of variables (Table 5) . Therefore, as already noted in the first simulation example, lagged effects affect the performance of the correlation measures and erroneous contemporaneous relationships between the variables are indicated. Regarding the causality measures, PTENUE suggests the correct causal effects but additionally indicates the causal effect from X 1 to X 2 . Based on the systems's equations, by substituting x 1t in the equation of X 2 t, a lagged effect of x 1t−2 on x 2t is obtained: x 2t = (0.6x 1t−2 + e 1t ) + 0.3x 2t−1 + e 2t . Therefore, the link X 1 → X 2 is not erroneously found by the causality measures; a lagged relation emerges based on the equations of the system. RCGCI has similar performance to PTENUE; however, it also detects the indirect link X 1 → X 3 with a low percentage (22%). PDC has again the worst performance overestimating the coupled pairs of variables. Such a designed system favors the instantaneous causality measures, since both contemporaneous and causal effects exist. PMIME0 correctly identifies the contemporaneous dependence of X 1 and X 2 and the causal links, whereas also X 2 → X 1 is suggested (97%). As previously noted, moderately high percentages of significant effects are obtained also for non causal pairs of variables reaching 26% for X 4 → X 5 . PCMCI+ correctly infers the contemporaneous and causal dependencies; however, it seems to give high percentages of significant causal links in almost all directions.

In this paper, we have presented a brief review of the main connectivity measures currently used to infer the connectivity network of a examined complex system. Connectivity analysis is essential in different applications, such as in finance and neurophysiology. Nondirectional measures indicate the symmetric relationships of variables that evolve in synchrony, whereas directional measures infer the directions of the causal influences. The main limitations of the connectivity measures have been discussed in brief.

When studying the interdependencies of a system, connectivity may be inferred based on correlations or causality. However, selecting the proper methodology is still an open issue since the nature of real systems is in general unknown. This study investigated the efficacy of different connectivity measures for different simulated data, whereas complexity was further increased by considering non-normal noise terms in order to generate samples with skewed or/and non-symmetrical distributions. Simulation experiments were used to demonstrate the performance of the connectivity measures in three different scenarios, i.e., when data exhibit only contemporaneous dependencies, only directional causal effects, and finally both contemporaneous and temporal dependencies.

The main outcomes of the simulation study can be summarized to the following:

(a) Results suggest the sensitivity of correlation measures when temporal dependencies exist in the data. Correlation measures tend to erroneously indicate contemporaneous relations even though only lagged dependencies exist. (b) Causality measures do not spuriously indicate causal effects when data present only contemporaneous dependencies. We should note here that the poor performance of PDC for systems 2 and 3 may be due to the fact that significant PDC values are reported comprehensively for all the examined frequencies. In real applications, usually specific frequency bands are selected according to the types of samples [211, 212] . (c) Instantaneous causality measures handle contemporaneous and causal effects at the same time. Therefore, it seems to be highly promising for analyzing the connectivity structure of real data. Although both considered instantaneous causality measures seem to have potential and effectively infer the dependencies of most examined systems, they tend to give high percentages of significant causal effects for non-causal pairs of variables. This is a problem that explicitly reduces the effectiveness of the measures. The consideration of different values for the free parameters of the measures, such as the significance level or the number of neighbors for PMIME0, may improve the performance of the measures; however, here, only standard values of free parameters are used at all the examined systems for all causality measures. A possible optimization of the free parameters of the measures is out of the scopes of this work. However, the necessity of an automatic selection of standard free parameters of any connectivity measure in case of real applications should be pointed out.

The indicative simulation study highlights the limitations and advantages of the different connectivity measures. The outcomes of this study are suggesting the superiority of the causality measures over the correlation measures and the instantaneous causality measures. Correlation measures are highly affected by lagged directional relationships. Instantaneous causality measures, although promising, still need to be optimized to be effectively applied. At this point, we should note that correlation measures are effectively utilized long-term and are suitable for specific data types with possibly known topological features or characteristics, e.g., [213, 214] . Further, their computation is extremely fast in contrast to the time-consuming estimation of most nonlinear causality measures.

Future studies aim to further investigate the above findings by testing additional scenarios regarding the samples and the nature of the dependencies, i.e., by considering samples with longer memory, samples that exhibit volatility clustering and samples of higher dimensions. 

Investigating causal relations by econometric models and cross-spectral methods

Comparisons of EEG spectral and correlation measures between healthy term and preterm infants

A general statistical framework for frequency-domain analysis of EEG topographic structure

A comparison of high-frequency cross-correlation measures

Pearson versus Spearman, Kendall's tau correlation analysis on structure-activity relationships of biologic active compounds

Comparison of co-expression measures: Mutual information, correlation, and model based indices

Detecting pairwise correlations in spike trains: An objective comparison of methods and application to the study of retinal waves

Survey on the estimation of mutual information methods as a measure of dependency versus correlation analysis

Efficient test for nonlinear dependence of two continuous variables

Comparison of some correlation measures for continuous and categorical data

Mean phase coherence as a measure for phase synchronization and its application to the EEG of epilepsy patients. Phys. D Nonlinear Phenom

Event synchronization: A simple and fast method to measure synchronicity and time delay patterns

Phase synchronization measurements using electroencephalographic recordings

Synchronization measures in EEG signals

Comparison of phase synchronization measures for identifying stimulus-induced functional connectivity in human magnetoencephalographic and simulated data

Evaluating phase synchronization methods in fMRI: A comparison study and new approaches

Comparison of univariate and multivariate Granger causality in international asset pricing. Evidence from Finnish and Japanese financial economies

Comparison of Granger causality and phase slope index

Reliability of multivariate causality measures for neural data

A comparison of multivariate causality based measures of effective connectivity

Comparative performance evaluation of data-driven causality measures applied to brain networks

Measures of causality in complex datasets with application to financial data

Dimension reduction of frequency-based direct Granger causality measures on short time series

Evaluation of Granger causality measures for constructing networks from multivariate time series

Detecting direct causality in multivariate time series: A comparative study

Price correlation and Granger causality tests for market definition

Stock Markets, Banks, and Growth: Correlation or Causality?

Econometric measures of connectedness and systemic risk in the finance and insurance sectors

Pearson correlation and Granger causality analysis of Twitter sentiments and the daily changes in Bist30 index returns

What is strong correlation?

Common pitfalls in statistical analysis: The use of correlation techniques

Should Pearson's correlation coefficient be avoided?

Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models

Conducting correlation analysis: Important limitations and pitfalls

Mitigating the effects of measurement noise on Granger causality

The impact of latent confounders in directed network analysis in neuroscience

A tutorial review of functional connectivity analysis methods and their interpretational pitfalls

Assessing granger causality in electrophysiological data: Removing the adverse effects of common signals via bipolar derivations

Testing different methodologies for Granger causality estimation: A simulation study

The effect of a hidden source on the estimation of connectivity networks from multivariate time series

Causal inference for time series analysis: Problems, methods and evaluation. arXiv 2021

Note on regression and inheritance in the case of two parents

The proof and measurement of association between two things

Rank Correlation Methods

A non-parametric test of independence

A multivariate version of Hoeffding's phi-square

Linear Statistical Inference and Its Applications

Measuring and testing dependence by correlation of distances

Brownian distance covariance

Partial distance correlation with methods for dissimilarities

On the association of attributes in statistics: With illustrations from the material of the childhood society &c

On the methods of measuring association between two attributes

Approximating the tetrachoric correlation coefficient

Statistical inference for generalized Yule coefficients in 2 × 2 contingency tables

The randomized dependence coefficient

Copula correlation: An equitable dependence measure and extension of Pearson's correlation

A copula-based correlation measure and its application in Chinese stock market

Copula-based measures of multivariate association

Partial correlation with copula modeling

Copula-based kernel dependency measures. arXiv 2012

Copula-based analysis of multivariate dependence patterns between dimensions of poverty in Europe

On the copula correlation ratio and its generalization

A nonlinear correlation measure for multivariable data set. Phys. D Nonlinear Phenom

Independent coordinates for strange attractors from mutual information

Gambling and data compression

Mutual information matrix based on asymmetric Shannon entropy for nonlinear interactions of time series

Entropy for measuring predictive power of generalized linear models

Entropy coefficient of determination for generalized linear models

Detecting novel associations in large data sets

Model selection method based on maximal information coefficient of residuals

Time-delayed mutual information of the phase as a measure of functional connectivity

Electric Fields of the Brain: The Neurophysics of EEG

Random Data

Identification of patterns of neuronal connectivity-Partial spectra, partial coherence, and neuronal interactions

Multivariate partial coherence analysis for identification of neuronal connectivity from multiple electrode array recordings

Phase synchronization of chaotic oscillators

Measuring phase synchrony in brain signals. Hum. Brain Mapp

Cooperative dynamics of oscillator communitya study based on lattice of rings

Recurrence plots of dynamical systems

Detecting deterministic signals in exceptionally noisy environments using cross-recurrence quantification

Nonlinear analysis of bivariate data with cross recurrence plots

Embeddings and delays as derived from quantification of recurrence plots

Dynamical assessment of physiological systems and states using recurrence plot strategies

General formulation of Shannon's main theorem in information theory

A definition of conditional mutual information for arbitrary ensembles

Elements of Information Theory

Introduction to Robust Estimation and Hypothesis Testing

Discovery of meaningful associations in genomic data using partial correlation coefficients

Comparing Pearson, Spearman and Hoeffding's D measure for gene expression association analysis

Sora, I. Correlation of tau gene polymorphism with age at onset of Parkinson's disease

Three-channel correlation analysis: A new technique to measure instrumental noise of digitizers and seismic sensors

On the importance of the Pearson correlation coefficient in noise reduction

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Normalized measures of mutual information with general definitions of entropy for multimodal image registration

Portfolio optimization using rank correlation

Linear and nonlinear market correlations: Characterizing financial crises and portfolio optimization

Rank correlation analysis of investment decision for small investors in the Hong Kong derivatives markets

Application of Pearson correlation coefficient in wind power combination prediction

The networked partial correlation and its application to the analysis of genetic interactions

Reliability of principal component analysis and Pearson correlation coefficient, for application in artificial neural network model development, for water treatment plants

Tourism economies and islands' resilience to the global financial crisis

Noise-assisted multivariate EMD-based mean-phase coherence analysis to evaluate phase-synchrony dynamics in epilepsy patients

Networks in financial markets based on the mutual information rate

Quantifying influence in financial markets via partial correlation network inference

Multi-target positive emotion recognition from EEG signals

A correlation study between meteorological parameters and COVID-19 pandemic in Mumbai

An efficient stock market trend prediction using the real-time stock technical data and stock social media data

Pearson correlation coefficient-based performance enhancement of Vanilla neural network for stock trend prediction

What is information theory

Measurement of linear dependence and feedback between multiple time series

Partial Granger causality-Eliminating exogenous inputs and latent variables

Radial basis function approach to nonlinear Granger causality of time series

Kernel-Granger causality and the analysis of dynamical networks

A new NARX-based Granger linear and nonlinear casual influence detection method with applications to EEG data

A General Test for Granger Causality: Bivariate Model; Working Paper

Testing for linear and nonlinear Granger causality in the stock price-volume relation

A new statistic and practical guidelines for nonparametric Granger causality testing

Multivariate linear and nonlinear causality tests

Measuring information transfer

Confounding effects of indirect connections on causality estimation

Detection of direct causal effects and application to epileptic electroencephalogram analysis

MuTE: A MATLAB toolbox to compare established and novel estimators of the multivariate transfer entropy

Nonuniform state-space reconstruction and coupling detection

Direct-coupling information measure from nonuniform embedding

Low-dimensional approximation searching strategy for transfer entropy from non-uniform embedding

Detecting causality in multivariate time series via non-uniform embedding

Performance of different synchronization measures in real data: A case study on electroencephalographic signals

A robust method for detecting interdependences: Application to intracranially recorded EEG. Phys. D Nonlinear Phenom

Learning driver-response relationships from synchronization patterns

Topographic organization of nonlinear interdependence in multichannel human EEG

Nonlinear interdependence in neural systems: Motivation, theory, and relevance

Bivariate surrogate techniques: Necessity, strengths, and caveats

Effective detection of coupling in short and noisy bivariate data

Analyzing multiple nonlinear time series with extended Granger causality

Detecting causality in complex ecosystems

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Detecting causal relations in the presence of unmeasured variables

Causation Prediction and Search

Causation, Prediction, and Search

Causality: Models, Reasoning and Inference

A linear non-Gaussian acyclic model for causal discovery

Nonlinear causal discovery with additive noise models

Probabilistic Graphical Models: Principles and Techniques

On causal discovery from time series data using FCI

Detecting and quantifying causal associations in large nonlinear time series datasets

Causal network discovery by iterative conditioning: Comparison of algorithms

A new method of the description of the information flow in the brain structures

Partial directed coherence: A new concept in neural structure determination

Determination of information flow direction among brain structures by a modified directed transfer function (dDTF) method

Generalized partial directed coherence

Robustly estimating the flow direction of information in complex physical systems

Testing for short-and long-run causality: A frequency-domain approach

Analyzing information flow in brain networks with nonparametric Granger causality

Inferring direct directed-information flow from multivariate nonlinear time series

Kalman filter-based time-varying cortical connectivity analysis of newborn EEG

A nonlinear causality measure in the frequency domain: Nonlinear partial directed coherence with applications to EEG

New Introduction to Multiple Time Series Analysis

Extended Granger causality: A new tool to identify the structure of physiological networks

Testing instantaneous causality in presence of nonconstant unconditional covariance

Estimation of a structural vector autoregression model using non-gaussianity

Causal inference on time series using restricted structural equation models

Tolstikhin, I. Towards a learning theory of cause-effect inference

Elements of Causal Inference: Foundations and Learning Algorithms

A framework for assessing frequency domain causality in physiological time series with instantaneous effects

Frequency domain repercussions of instantaneous Granger causality

Instantaneous Granger causality with the Hilbert-Huang transform

Extended causal modeling to assess partial directed coherence in multiple time series with significant instantaneous interactions

Compensated transfer entropy as a tool for reliably estimating information transfer in physiological time series

Directed information graphs for the Granger causality of multivariate time series

Identification of hidden sources by estimating instantaneous causality in highdimensional biomedical time series

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

Lexical influences on speech perception: A Granger causality analysis of MEG and EEG source estimates

Granger causality analysis in neuroscience and neuroimaging

Comparison of transfer entropy methods for financial time series

How do the global stock markets Influence one another? Evidence from finance big data and granger causality directed network

Disclosing large-scale directed functional connections in MEG with the multivariate phase slope index

Discovery of spatial-temporal causal interactions between thermal and methane anomalies associated with the Wenchuan earthquake

Causation discovery of weather and vegetation condition on global wildfire using the PCMCI Approach

An extended spatio-temporal Granger causality model for air quality estimation with heterogeneous urban big data

Causal identification based on compressive sensing of air pollutants using urban big data

Criticality and information dynamics in epidemiological models

A review of spatial causal inference methods for environmental and epidemiological applications

Correlation and dependence in risk management: Properties and pitfalls. Risk Manag. Value Risk Beyond

Measuring connectivity in linear multivariate processes: Definitions, interpretation, and practical analysis

Statistical pitfalls in the comparison of multivariate causality measures for effective causality

A lack of statistical pitfalls in the comparison of multivariate causality measures for effective causality

A study of problems encountered in Granger causality analysis from a neuroscience perspective

On the interpretability and computational reliability of frequency-domain Granger causality

Advanced modelling strategies: Challenges and pitfalls in robust causal inference with observational data

Teaching statistics = teaching thinking statistically

Detecting causality in non-stationary time series using partial symbolic transfer entropy: Evidence in financial data

Online platform supporting teaching correlation

Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses

Sample size guideline for correlation analysis

Minimum sample size for reliable causal inference using transfer entropy

Making Causal Inferences in Small Samples Using Synthetic Control Methodology: Did Chrysler Benefit from Government Assistance?

Detecting causality from short time-series data based on prediction of topologically equivalent attractors

Estimation of causal effects with small data in the presence of trapdoor variables

Effect of noise on the evaluation of correlation coefficients in two-dimensional correlation spectroscopy

Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data

The effect of noise reduction in measuring the linear and nonlinear dependency of financial markets

Neural characterization in partially observed populations of spiking neurons

Modeling the impact of common noise inputs on the network activity of retinal ganglion cells

Causal inference with multiple time series: Principles and problems

Inferring hidden states in a random kinetic Ising model: Replica analysis

Causal inference by identification of vector autoregressive processes with hidden components

Causal network reconstruction from time series: From theoretical assumptions to practical estimation

E-Statistics: Multivariate Inference via the Energy of Data

Estimating mutual information

Enhanced frontocentral EEG connectivity in photosensitive generalized epilepsies: A partial directed coherence study

Using partial directed coherence to study alpha-band effective brain networks during a visuospatial attention task

ENSO and soybean prices: Correlation without causality

Correlation: Not all correlation entails causality

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.