key: cord-0498679-8isy0ucv authors: Gupta, Kartikay; Chatterjee, Niladri title: Examining Lead-Lag Relationships In-Depth, With Focus On FX Market As Covid-19 Crises Unfolds date: 2020-04-22 journal: nan DOI: nan sha: 2aeb129da485f4e4c98311f6b4c9130e772b3334 doc_id: 498679 cord_uid: 8isy0ucv The lead-lag relationship plays a vital role in financial markets. It is the phenomenon where a certain price-series lags behind and partially replicates the movement of leading time-series. The present research proposes a new technique which helps better identify the lead-lag relationship empirically. Apart from better identifying the lead-lag path, the technique also gives a measure for adjudging closeness between financial time-series. Also, the proposed measure is closely related to correlation, and it uses Dynamic Programming technique for finding the optimal lead-lag path. Further, it retains most of the properties of a metric, so much so, it is termed as loose metric. Tests are performed on Synthetic Time Series (STS) with known lead-lag relationship and comparisons are done with other state-of-the-art models on the basis of significance and forecastability. The proposed technique gives the best results in both the tests. It finds paths which are all statistically significant, and its forecasts are closest to the target values. Then, we use the measure to study the topology evolution of the Foreign Exchange market, as the COVID-19 pandemic unfolds. Here, we study the FX currency prices of 29 prominent countries of the world. It is observed that as the crises unfold, all the currencies become strongly interlinked to each other. Also, USA Dollar starts playing even more central role in the FX market. Finally, we mention several other application areas of the proposed technique for designing intelligent systems. Information plays a critical role in our lives, specifically in financial markets. In financial markets, any accurate information regarding future trends is very financially rewarding. It is sometimes observed that some stock pairs may not have high Pearson correlation coefficient between them, but they are highly correlated at certain lead-lag. Also, it is found that prices of certain financial commodities are following the trends of some other commodity. This phenomenon where a certain time-series replicates the movements of a leading time-series partially at a specific time lag is called lead-lag relationship [1] . The present paper proposes an empirical technique which better identifies the time-varying lead-lag relationship between two time-series. The rest of the paper is organized as follows. We first give a broad literature review of the lead-lag relationship. Then, we describe the proposed methodology to determine the lead-lag relationship. Then we test this technique to find a known lead-lag relationship in synthetic time-series empirically. Finally, we use this methodology to decipher existing patterns or connections between financial time-series in the FX market. It may seem intuitive that any information like a lead-lag relationship which can be utilized for trend discovery should be immediately utilized for making financially profitable transactions. This is not always the case due to factors like delay in information transmission or information arrival. This phenomenon is most phenomenally visible in the time-series of spot and options prices of the same underlying stock or commodity. Here sometimes, it is observed that the options-price series leads the spot-price series [2] , which maybe because it is faster to quickly assimilate any new information regarding the future trend into the option price-series as compared to spot price-series. The lead-lag relationship can also be observed in the spot and futures prices of a commodity. The lead-lag relationship in KOSPI200 spot market, its futures market, and its options market are empirically examined and commented upon in the study done by Lee et al. [2] . Tian et al. [3] investigate Taiwan financial markets and find that index-future-prices during non-cash trading-period leads the cash-market during its opening-period. Moews et al. [4] develop an intelligent system to better predict future movements in financial time-series using lagged-correlations with other time-series. Hui et al. [5] find the existence of a time-dependent lead-lag relationship between prices and volume in mini Taiwan exchange futures. The phenomenon of lead-lag relationship is also observed in the price-series of a commodity being traded at different exchanges. Here, High-Frequency-Trading (HFT) is performed by traders to quickly dissipate any price inconsistencies between two exchanges, while earning huge profits through it. High-Frequency Data in the financial market is gathered at irregular intervals, which makes it challenging to decipher the lead-lag relationship between two different stocks or markets. Thus, an estimator is proposed in [6] , which better estimated the cross-covariance by avoiding imputation and using all available transaction. In the study conducted by Robert et al. [7] , certain properties of the covariance matrix of increments of two Gaussian processes, partially correlated at some time -lag, is studied. It may not always be possible to utilize lead-lag information profitably. Still, a lead-lag relationship between two time-series may be indicative of casualty or strong-connection between the two time-series. Thermal Optimal Path (TOP), first proposed by Zhou and Sornette [8] , has been used in the past for obtaining a continuously time-varying lead-lag path between two financial instruments. [8] - [11] are some of the papers which utilise TOP in their analysis of the financial markets. The TOP method has been picked up from physics literature, and it uses Euclidean distance for comparison at the most basic or micro-level. The present measure uses correlation-based distance at micro-level. Correlation distance is more suitable for financial data as it may indicate causality between the two time-series. Further, TOP does not explicitly provide any measure to quantify the strength of the relationship between the two financial time-series. Another distance measure of significance for any general time-series data is the Dynamic Time Warping (DTW) measure [12] , [13] . DTW is generally considered as the best distance measure for time series mining tasks across virtually all domains [14] . DTW measure is especially of advantage in speech recognition [15] where it can decipher the sounds of different words, even when different parts of the word have different elongations. Jin et al. [10] used DTW measure to analyse the network structure of the Foreign Exchange market. Zhu et al. [16] tried to reduce the time complexity of the DTW measure by approximating its value. In work by Silva et al. [14] the effects of relaxing various constraints on the DTW distance measures are studied. TOP has been one of the most prominent methodologies for empirically finding the lead-lag path. The present methodology shows superior results than TOP. TOP may be considered as a more theoretically evolved version of DTW, as it also uses Dynamic Programming for computation purposes. The present methodology also improves upon DTW by subtly combining the properties of DTW measure and correlation measure. The proposed Aligned Correlation (AC) can more accurately determine the lead-lag relationship between time-series. Empirically determination of the best lead-lag path (exact solution) between two time-series requires exponential order of time, as explained in the next section. This is an NP-Hard problem, and Dynamic Programming (DP) is used to obtain an approximate solution in much lesser time. DP is generally used for solving other NP-Hard problems also [17] . The present AC measure takes motivation from the DCCT measure, described in [18] . The AC measure does not require to choose between one of the values of a free parameter 'p', as required in the DCCT measure. Further, the present work provides an in-depth theoretical discussion on the metric properties of the AC measure. It also provides more elaborate testing and comparisons, as compared to the work [18] . Though the DCCT measure is used for profitable pairs trading, the AC measure has been used to study the Foreign Exchange market. Now, we describe the proposed AC measure in detail. Let xt , yt ( t ∈ 1,2,…,n) be two time- The parameter name 'psi' (which stands for 'Post Suffix Invariant'), has been inspired by [14] where they introduced this parameter to relax the boundary condition in DTW. The alignment path is computed using 'Dynamic programming' techniques as done in DTW measure [14] . The AC measure uses the above definition in its construction. Let us denote the computation of the alignment path by step 1. As mentioned earlier, let xt , yt ( t ∈ 1,2,…,n) be two time-series of normalised prices of two stocks. Let rxt , ryt ( t ∈ 1,2,…,n) denote the consequent return time-series. Let ( , , ), a function over the sequence Pl = (pl ,ql), be defined as follows: Here, 'p' is the window size parameter, which denotes the length of the window. The timeseries are appended with ⌊ 2 ⌋ zeroes at both the ends, so that the above expression can be calculated. In the present research, we use three values of parameter 'p' i.e., 25, 51 and 101. Then, we find an alignment path P = (P1, P2, …Pl… , PL) which minimises the function given the path-constraints as mentioned earlier. This is done through Dynamic Programming techniques as used in DTW measure. This optimization is done in two steps: 1) First, the optimal paths are calculated for each value of the parameter p ( i.e., 25, 51 and 101), which is given by: This is achieved by the DTW algorithm where Euclidean distance is replaced by CR metric, which is also the Euclidean distance between two normalized vectors. 2) Then among these paths, we finally pick the path P = {( , ), = 1 … }, which minimises the following expression: Step 2: Computation of AC measure Finally, the AC measure is the correlation-distance along the alignment path, which is calculated as follows: where, P is the chosen alignment path (pi, qi). In the present experiments, the parameter 'psi' has been kept equal to the parameter 'p' just for simplicity. Here √2(1 − ) has been chosen as it transforms correlation measure ( ) into Euclidean distance metric between two time-series with unit variance and zero-mean. Francisco et al. [19] showed through their work that, in statistical significance, DTW measure satisfies the Triangular Inequality (TI). They reached this conclusion by testing over 15 million triplets for TI, which arose from speech data of 800 time-series. AC measure can also be termed as a 'loose metric' as done in [19] . This is because the AC measure is same as the Euclidean distance between normalized time-series along the wrapping path, i.e., DTW measure. Here, the normalizing variance is slightly different from the variance of the original time-series. The slight difference is due to repetition and rare removal of a few terms in the whole time-series. In fact, the AC measure is not equal to the Euclidean metric only because of different alignment of the series along the time. The path obtained here is a valid wrapping path from start to end. It is not the absolute optimal path which minimizes E.D., but it additionally maximises correlation along the path. Many modifications of DTW measure have proposed to put additional constraints on the DTW measure. This measure may also be considered as a measure which puts some additional constraints over DTW measure. It is crucial that CR should be a metric. If we replace it with another measure, we need to make sure that it is a metric. Also, we cannot compare the final ∑ ( , can be re-arranged to denote the sum of squares of differences between certain normalized segments of the two time-series. This expression is dependent on the parameter 'p', and it can not be used for comparison across different 'p' values. Thus, instead, we minimize the correlation metric over the aligned path. The Here, DP can be used to find an approximate solution quickly. The proposed AC measure has the same order of time complexity as the DTW measure when seen in terms of the length of the time series. Though, the AC measure has a higher constant term, which increases with the increase in the number of window-sizes (parameter 'p') used for finding the alignment path. Zhou and Sornette [8] while first introducing Thermal Optimal Path (TOP) [8] method for application in economics, justified its usage with two comparative experiments on synthetic time series data. Here, we will do very similar experiments to determine the superiority of the AC measure path over other models. Another experiment to test self-consistency of TOP results has been conducted earlier several times, like in [9] - [11] . This test will also be performed to determine the validity of the proposed AC technique. Suppose we have a synthetic time series with a time-varying lead-lag relationship. Then these methods should be able to detect this path, even in the presence of noise experimentally. This experiment is based on the experiments in [8] . In general, the synthetic time-series are of the form: where ( 1 ) is generated through the following process: here, the noise-terms and will be explained shortly. Now, given the two time-series for some finite length, our aim is to empirically determine the lead-lag structure, i.e., ( 1 ). In the present experiments, we use four different synthetic time series (STS) in two different sets of experiments. We use STS-1 and STS-2 for testing the significance of lead-lag paths obtained through different algorithms. This test has been earlier employed in [9] - [11] , for testing the significance of TOP. The underlying logic of this test is that, if the lead-lag path (x(t)) is significant, then these two synchronized time series, i.e., X(tx(t)) and Y(t) should exhibit a strong linear dependence. It leads to the following regression. In the above equation, the coefficient 'a' should be significantly different from 'zero' for statistically significant dependence. Next we give detail of STS-1. The STS-1 is as follows: where X(t) is itself a stochastic process given by: where b < 1 and the noise ξ ∼ N(0, σξ) is serially uncorrelated. The factor f = ση/σξ quantifies the amount of noise degrading the causal relationship between X(t1) and Y(t2 All the STS are similar processes with different parameter values. In our simulations, we generate X and Y of first STS with parameters a = 0.8, b = 0.7, and f = 0.5. TOP has a free variable 'Temperature' which needs to be fixed before finding the path. In work by [11] , it is proposed that the temperature value of 2 is generally optimal, and this finding is asserted again in [20] . visually more closer to the actual path than other models. As seen in Figure 1 , the AC method is able to perfectly identify the lead-lag structure (x(t)) during the periods when (x(t)) remains temporarily unchanged. It only sometimes fails to capture the path during periods of transitions or jumps. Whereas TOP shows poor performance than AC during both the periods of transitions or no-transitions of x(t). DTW measure has shown visibly better performance than TOP but poor performance than AC. The same phenomenon is observed in Figures 2,3 and 4 . Next is the examination of the empirically obtained lead-lag structure on the basis of the selfconsistency test. We perform this test analogously, as described in [11] . We implement this test in moving windows of size 100, which move forward one-time-step from beginning to end of the time series. Thus, we obtain (300 -100 + 1) such windows over the time series of length 300. Within each window, the two-time-series are synchronized (or not synchronized in the case of 'Unsynched Path'), for estimating the significance of the coefficient 'a'. Table 1 gives the results for this experiment. Here, we observe that among the 201 windows with Again, we repeat this experiment on STS-2 and again find that AC has given the best results as seen in Table 2 . Next, we do experiments as done in [8] , to test the forecastability of the lead-lag structure. Here, we consider the synthetic time series where X(t) leads Y(t) in general, and thus values of X(t) can be used for predicting future values of Y(t). We use STS-3 and STS-4 for this test. This test examines the ability to obtain correct forecasts through the lead-lag path found by different algorithms empirically. Next, we give details of the STS used in this test. The STS-3 is as follows: In this prediction set-up, we assume that we know the underlying model and the only challenge is to calibrate the lag. The predicted values ̂( + 1) are compared to actual values Y(i+1), using Mean Absolute Deviation (MAD) error. MAD is indicative of the maximum loss that will be obtained by financially 'betting' on the predicted ̂( + 1) values. Table 3 : MAD error of the predicted values for different models in STS 3. ( ) = the different models. We observe that in general, the performance of the AC method is the best except the hypothetical 'Actual Path' model. Thus, though TOP gives better results than a classic-correlation approach, as described in [8] , but it performs poorly when compared with the AC approach. Also, DTW measure shows poor performance when compared with AC measure. The present section uses the AC measure to analyse the network evolution of foreign exchange currency as the COVID-19 outbreak unfolds. Recently, several speculations have been raised regarding the status of the USA dollar (USD) as the world's reserve currency. USD is soon losing its dominance as the world's reserve currency. Many experts are of the view that USD may be replaced by a bucket comprising of RUB, CNY, EUR, oil-backed OPEC currencies etc. Also, due to the recent outbreak of COVID-19 in China, which has first and foremost severely affected the Chinese Stock market, one is tempted to study the topology of correlation networks among major currencies and topology evolution of Foreign Exchange (FX) market. Topology network analysis through Minimum Spanning Tree (MST) was first introduced in [21] , to study the stock prices in financial markets. Jang et al. [22] used topology network analysis to efficiently illustrate the structural and market properties of the financial market. This tool has also been used for financial markets of different regions of the world [23] - [26] . The topology network analysis of the FX market is done in [27] , where FX prices of 28 currencies for a period of 12 years from 1990-2002 is analysed. They conclude that USD is the most leading currency in the world. Naylor et al. [28] , also used this tool in the FX market and used NZD and USD as numeraries. They found that South-East Asian currencies strongly grouped together during the South-East Asian crisis period. In most of the past such analysis of network evolution, the correlation has been chosen as the preferred metric. Jin et al. [13] used DTW-measure for doing such analysis of foreign exchange data. DTW measure aligns the two time-series along time, so it can also be used in cases where the two time-series are not of the same length. Also, it is costly and difficult to obtain foreign exchange currency prices of many countries for a long duration, due to different operating hours of exchanges in different countries. DTW measure can be used even if the time-series contain several missing values, without any further data pre-processing step to remove or approximate the missing values. Thus, the DTW measure provides a good alternative to correlation measure as well-argued in [13] . The proposed AC measure also has all these advantages over correlation measure. Further, the AC measure is able to better incorporate the lead-lag relationship effect into its value than DTW measure. It chooses the path along which the correlation is maximum while incorporating any information regarding the lead-lag relationship. As described in [29] , [30] , the lead-lag relationship may be existing in the foreign exchange market too, so its effect should not be ignored entirely. The proposed AC measure, as we shall see later in the discussion section, mostly chooses the zero-lag path for most of the major currencies, and any deviation is very small. Thus, the proposed AC measure maintains the interpretability of correlation measure while incorporating the effect of any lead-lag relationship. The data consists of foreign exchange currency prices of 29 prominent countries of the world against NZD (see Table 5 ). The data was sourced from Thomson Reuters Eikon platform. We choose NZD as the numeraire as it was preferred in [13] , [28] . The AC measure has been used for constructing the Minimum Spanning Tree (MST). MST requires the distance measure to be a metric, i.e.; it should satisfy the triangular inequality. As discussed earlier, the proposed AC is a loose-metric. The alignment path in the proposed measure causes the final distance measure to be slightly different from the Euclidean metric (see Section 4 ), which is not generally sufficient enough to violate the triangular inequality. The triangular inequality is more likely to be violated if all the three points lie near to a straight line. This is highly unlikely as the time-series are very high-dimensional points, and further, they have been normalized to lie on the unit circle, during the calculation of the loose-metric. Further, empirically we verify that all the triplets in the two distance matrices, corresponding to two parts of the data, satisfy the triangular inequality. Hence, as done similarly earlier [13] , MST based network approach can still be used for the analysis. Here, we describe the measures used for evaluating the MST, to study the FX market. We will be using four measures which are as follows. It is defined as , where D is the NxN dissimilarity matrix, and N is the total number of currencies i.e., 29. The present paper calls the proposed AC measure as a dissimilarity measure as opposed to the term similarity measure (see [13] ) as it increases with the increase in farness/dissimilarity between the two objects. It is given by where is the set of edges and it contains the edges present in the MST. This measure has also been used in [22] , to evaluate the MST. It is given by where is the sum of the weights in the shortest path from node i to node j. This measure gives the average minimum path distance between any two nodes in the MST [31] . It is defined as the number of non-leaf nodes present in the graph. This measure helps to judge the loose degree of MST. Non-leaf Nodes 10 13 Table 6 : Evaluation measures for the two topological networks. As observed earlier too [13] , the mean dissimilarity measure decreases during the crisis (see Table 6 ). 'Average Lead/lag' is the average value of lead-lag along the aligned path. 'Non-zero ratio' is the ratio of points along the lead-lag path which are different from zero to the total points of the lead-lag path. As seen in Table 7 , the optimal lead-lag path is mostly equal to zero. This is clearly observed in pairs with the lowest distance measure. The lead-lag path sometimes deviates from zero, but the deviations are minor and very rare. In the last few pairs, though the lead-lag path is different from zero, but the correlation values are close to zero. This supports the previous study [30] that it is hard for any significant lead-lag relationship to exist at a frequency of 1 minute or lower in FX markets. But still, the proposed measure takes care of minor corrections, which may have inadvertently arisen into the market prices due to reasons like delay in information-transmission or human-errors. There exist several studies that can be used for further extending the analysis of correlationbased topological networks. These studies try to overcome the drawback of MST, i.e., loss of information. These studies include [32] - [35] , which create graphs which retain more information in them than MST. Since all these techniques employ correlation coefficient, thus there is a possibility of extending the present research in the direction of these techniques. Other possible areas, where this research may be useful is in the analysis of tick by tick data like in [36] . Tick by tick data needs to be aligned in time, which is done by the proposed measure. Specifically, in the FX market, where there is a vast difference in the liquidity and volumes of different currencies, this time alignment becomes very important. Also, the leadlag relationship exists substantially in tick-by-tick data. Thus, it will be interesting to see how this research extends on tick-by-tick data. The proposed technique can also be used in the analysis of the lead-lag relationship of certain other important time-series like done in [10] , [11] , [20] . The proposed technique, which helps identify the lead-lag relationship, may be extended to help in profitable pairs-trading like successfully accomplished in [37] , [38] . It can be used for improving systems which use leadlag information to achieve better forecasts like [39] , [40] . DP is a good tool to solve computationally demanding problems [17] . DP based algorithms are embedded in the chips of computers for computation purposes [41] . Thus, much effort has been put in to reduce the time-complexity of this algorithm further, and now an algorithm is available, which can crudely approximate DTW measure in linear time [42] . The present research can be extended to incorporate these studies. DTW and correlation are two of the most frequently used measures in Temporal data-mining literature, and the proposed measure combines them effectively to achieve better task- Estimation of the lead-lag parameter from non-synchronous data An Empirical Investigation of the Lead-Lag Relations of Returns and Volatilities among the KOSPI200 Spot, Futures and Options Markets and their Explanations Investigating the information content of non-cash-trading index futures using neural networks Lagged correlation-based deep learning for directional trend change prediction in financial time series The causality of hourly price-volume relationship: An empirical study of mini Taiwan exchange futures High frequency analysis of lead-lag relationships between financial markets On the limiting spectral distribution of the covariance matrices of time-lagged processes Non-parametric determination of real-time lag structure between two time series: The 'optimal thermal causal path' method with applications to economic data Time-varying lead-lag structure between the crude oil spot and futures markets Time-dependent lead-lag relationship between the onshore and offshore Renminbi exchange rates Symmetric thermal optimal path and time-dependent lead-lag relationship: novel statistical tests and application to UK and US real-estate and monetary policies Using dynamic time warping to find patterns in time series Similarity measure and topology evolution of foreign exchange markets using dynamic time warping method: Evidence from minimal spanning tree On the Effect of Endpoints on Dynamic Time Warping Dynamic Programming Algorithms in Speech Recognition A Novel Approximation to Dynamic Time Warping allows Anytime Clustering of Massive Time Series Datasets Dynamic programming for NP-hard problems Selecting stock pairs for pairs trading while incorporating lead-lag relationship On the Metric Properties of Dynamic Time Warping Time-varying lead-lag structure between the crude oil spot and futures markets Hierarchical structure in financial markets Currency crises and the evolution of foreign exchange market: Evidence from minimum spanning tree Characteristics of the Korean stock market correlations Correlation study of the Athens Stock Exchange An ever-closer union? Examining the evolution of linkages of European equity markets via minimum spanning trees Topological properties of stock market networks: The case of Brazil Cross-country hierarchical structure and currency crises Topology of foreign exchange markets using hierarchical structure methods Cross-correlations between Renminbi and four major currencies in the Renminbi currency basket Lead-lag relationships in foreign exchange markets Statistical Analysis of Weighted Networks A tool for filtering information in complex systems Statistical analysis of financial networks A network analysis of the Chinese stock market A network perspective of the stock market Ultra-high-frequency lead-lag relationship and information arrival High Frequency Statistical Arbitrage Via the Optimal Thermal Causal Path Statistical arbitrage with optimal causal paths on high-frequency data of the S&P 500 Forecasting stock market crisis events using deep and statistical machine learning techniques Predicting stock index increments by neural networks: The role of trading volume under different horizons PACE: A dynamic programming algorithm for hardware/software partitioning Toward accurate dynamic time warping in linear time and space The authors were supported by a research grant from IIT Delhi, India.