key: cord-0673066-n67qpa0x authors: Bouchaud, Jean-Philippe; Mastromatteo, Iacopo; Potters, Marc; Tikhonov, Konstantin title: Excess Out-of-Sample Risk and Fleeting Modes date: 2022-05-02 journal: nan DOI: nan sha: 05df700fd5b83aa085ce61093f43aa408911c06e doc_id: 673066 cord_uid: n67qpa0x Using Random Matrix Theory, we propose a universal and versatile tool to reveal the existence of"fleeting modes", i.e. portfolios that carry statistically significant excess risk, signalling ex-post a change in the correlation structure in the underlying asset space. Our proposed test is furthermore independent of the"true"(but unknown) underlying correlation structure. We show empirically that such fleeting modes exist both in futures markets and in equity markets. We proposed a metric to quantify the alignment between known factors and fleeting modes and identify momentum as a source of excess risk in the equity space. Introduction Managing the risk of large portfolios requires the knowledge of equally large covariance matrices, describing the whole array of pairwise cross-correlation between the assets included in the portfolio. As is well-known by now, the empirical determination of such covariance matrices is difficult -for at least two different reasons. One is that even in a stationary world, that is, a world described by an unknown underlying stochastic process with time independent parameters, empirical covariance matrices are soiled by a large amount of measurement noise, that only goes to zero as N /T , where N is the number of assets and T the amount of data points in the time direction -for a recent review, see [1] . Typical numbers are N = 500 stocks in a portfolio, and T = 1000 days of data (corresponding to 4 years), N /T ≈ 0.7 which is by no means small! A number of techniques have been proposed over the years to "clean" as efficiently as possible the empirical covariance matrix such that one approaches as well as possible the "true" covariance matrix [2, 3, 4] , and for reviews [5, 6] and refs. therein. Such cleaning schemes, some based on sophisticated Random Matrix Theory techniques, do help in reducing the discrepancy between "out-of-sample" risk (i.e. risk realized in a period outside the training sample) and "in-sample" risk (i.e. risk estimated on the same period as the training sample). However, the assumption of a stationary world is certainly too naive to describe financial markets. For one thing, volatility can strongly fluctuate from one period to the next, so "out-of-sample" risk may be larger or smaller than "in-sample" risk simply due to realized volatility. This is a well-studied issue, which can be partly mitigated by the use of sophisticated volatility models and/or using the forward looking, implied volatility from option markets. In this study, we are rather concerned about correlation risk. As a striking example, think of the correlation between the daily price changes of the S&P500 index and the US T-Bond. For many years before 1997, it hovered around +0.5, before suddenly switching sign around the so-called Asian crisis. It then remained in negative territory -in a "flight-tosafety" mode -for more than 20 years before possibly switching again in 2021/2022, time will say [7]. More generally, one can expect that as macroeconomic conditions evolve, the whole correlation structure between financial assets also evolves. Several ideas to quantify such a genuine evolution of correlations have been discussed in the past [8, 9, 10, 11] , in particular the interesting notion of "market states" [12, 13] . The main difficulty is to disentangle measurement noise, which leads to an apparent evolution of the empirical covariance matrix between two non overlapping periods, from any possible evolution of the underlying covariance matrix . In Ref. [14] , two of us proposed a non parametric method based on the overlap of the eigenvectors of in and those of out , where "in" and "out" refer, respectively, to the in-sample and out-of-sample period. Quite interestingly, our proposal did not require the knowledge of , only that it was time independent. In this note, we want to propose an alternative non parametric test, simpler and more transparent, which again does not rely on the knowledge of and allows one to diagnose periods of statistically significant excess correlation risk and identify the directions (in asset space) along which such excess risk manifests itself. Let X = X i,t be the return data set, where i is the asset label and t the time label. The in-sample covariance matrix in is defined as where T in is the length of the in-sample period. The out-of-sample covariance matrix out is defined similarly, with T out is the length of the out-sample period. Let us now introduce the matrix defined as: where 1/2 in is defined as the symmetric matrix square-root of in . The intuitive meaning of is as follows. By defining Y in := −1/2 in X in , where is an arbitrary rotation matrix, we construct a set of N synthetic assets (or portfolios) that are by construction ortho-normal, i.e. each synthetic asset is of unit risk and uncorrelated (in sample) with all other synthetic assets. Now the out-of-sample covariance matrix of these synthetic assets is given by := , whose eigenvectors specify a new set of uncorrelated synthetic assets, with variance given by the eigenvalues λ a (which are independent of ). Since the in-sample risk of the synthetic assets has been normalized to one, the eigenvectors associated with the eigenvalues λ a > 1 thus correspond to linear combinations of synthetic assets that over-realize their risk in the out-of-sample period. In the following, we will choose such that the synthetic assets are simply the principal risk components v µ of the in-sample covariance matrix in . We will call the directions v µ the statistical risk modes. Suppose q in = N /T in and q out = N /T out are both very small and the world is stationary, with true covariance matrix . Then, clearly Hence, in this case, all eigenvalues of are very close to unity -no portfolio over-realizes its risk, as expected. In the case where q in and q out take arbitrary values, one first notes that by definition, (white) Wishart matrices correspond to empirical covariance matrices when = . Hence one can write, still assuming stationarity: Now, since the characteristic polynomial of is the same as that of −1 in out 1/2 , which is turn is the same as that of −1 in out , we conclude that the eigenvalues of are actually independent of , and equal to those of our theoretical benchmark th. := where in , out are independent Wishart matrices of parameter, respectively, q in and q out . In the following, we will assume q in < 1 , i.e. T in > N , so that in is invertible. The matrix th. , a close relative of Jacobi random matrices, is the product of a Wishart and inverse-Wishart matrix and its spectrum can easily be computed, see e.g. [1] . Denoting λ its eigenvalues, the probability density function of λ reads: with where the symbol [] + denotes the positive part. Note that for q out > 1, a finite fraction of eigenvalues are exactly zero as expressed by the Dirac delta function. This density has mean Our null hypothesis test is thus the following: if the true underlying covariance matrix is the same in-sample and out-of-sample, the non-zero eigenvalues λ of should, for large N , all lie within the interval 1 λ with a distribution compatible with Eq. (6), see Fig. 1 for a particular illustration and numerical simulations. Several limiting cases are interesting to discuss. One is when the insample and out-of-sample period have the same length, i.e. T in = T out = T , or q in = q out = q. One then finds When both periods are very long compared to N , one has q → 0 and therefore the interval where the eigenvalues of are expected to be found is which, as expected, tends to a Dirac mass at λ = 1 for q = 0. 6)) for q in = 1/4 and q out = 1/4 (left) and q out = 4 (right) compared with a numerical simulation with N = 1000. Note that when q out > 1 as on the right, there is a Dirac at zero with weight 1 − q −1 out . Note also that the numerical histogram drops to zero slightly below the theoretical value λ max . This is a finite N effect, see also Fig. 2 . Now, look at another interesting regime where T in T out > N , i.e. long in-sample period and relatively short out-of-sample period, aiming at detecting abrupt "regime shifts". In this regime where q in → 0, we recover precisely the Marčenko-Pastur distribution with parameter q = q out [1] , as it should be since in that limit in ≡ and in ≡ . The non-zero eigenvalues of th. satisfy, with an additional Dirac mass of weight 1 − q −1 out when q out > 1. In order to quantify by how much real financial returns differ from their stylized counterpart above, we have constructed two data sets consisting of daily returns X = X i,t of two different groups of financial instruments. The first data set comprises a set of N = 98 liquid futures, covering different sectors (stock indices, commodities, FX, yields), expiry dates and geographies (America, Europe, Asia, and a smaller set of developing markets) in a period ranging from 2006-01-01 to 2022-03-01. The second data set consists in N = 300 US stocks in the period 2002-05-27 to 2022-01-21. 2 In order to get rid of any spurious volatility fluctuations and only focus on correlations, we normalize each daily returns by its own intraday volatility, constructed using a Garman-Klass estimator based on Close-High-Low-Close data. In both universes, we construct a set of rolling estimators in (t) and out (t) according to the following prescription. First, for each day after a "burning" period t > T in + T out , we consider the (in) interval as the one comprising returns belonging to [t − T out − T in , t − T out [, whereas the (out) interval is built with the returns belonging to [t−T out , t[. This construction ensures that i) intervals built at time t only use data available at day t ii) the intervals are contiguous but perfectly disjoint iii) all in-sample and out-of-sample intervals have exactly the same lengths T in and T out iv) under the hypothesis of i.i.d. returns, the estimators in (t) and out (t) are distributed according to the null model described above. For both futures and stocks, we have decided to fix q in = 1 4 and q out = 4, which corresponds to an in-sample period of a year and a half for futures (about five years for stocks) and an out-of-sample interval of approximately one month for futures (slightly less than four months for stocks). We then apply the definition Eq. 1 to both in-sample and out-of-sample intervals, obtaining rolling sets of covariance matrices indexed by t and denoted as in (t) and out (t). Finally, we build risk over-realization matrices (t) = , from which we can extract eigenvalues {λ a (t)} that can be compared to the ones prescribed by our null hypothesis. The corresponding eigenvectors also contain important information, to be discussed below. Fig. 2 illustrates the result of such comparison for both futures (left panel) and stocks (right panel), indicating that in both cases we detect significant departures from the null model. However, the average eigenvalue distribution does not distinguish between an intermittent scenario where risk-over realisation is clustered in time, from a uniform scenario where risk is always over-realized. Part of such information is summarized in Fig. 3 , where the evolution of the top eigenvalue λ 1 (t) is displayed for both our data sets, and compared to the value expected under our null model. Departure from the null model are relatively mild in some periods and stronger in others, with clear spikes, notably in the futures space. The directions along which excess out-of-sample risk is large are given by the eigenvectors z a of := diag(λ a ) corresponding to the largest eigenvalues λ a . As explained 9)) are significant for a substantial fraction of days. Note that the simulated null model with Gaussian returns (grey line) slightly undershoots the theoretical value of λ max because of finite N effects. above, because of our choice of := v, these eigenvectors can be interpreted as portfolios of the (in-sample) statistical risk modes v µ , µ = 1, . . . , N . We will call the eigenvectors z a fleeting modes. The first question one would like to ask is how close is the top fleeting mode z 1 (corresponding to the top eigenvalue λ 1 ) to the dominant in-sample risk modes v µ . We thus define the cumulative squared overlap as ψ n := n µ=1 ( z 1 · ν µ ) 2 , where z 1 · ν µ is the µth risk mode component of z 1 . Note that ψ N ≡ 1 for n = N , because the set of { v µ } forms an orthonormal basis. Fig. 4 shows ψ n as a function of n, both for futures and for equities, averaged over (a) days where the over-realisation of risk is in the top 10% and (b) days where the over-realisation of risk is in the bottom 90%. We compare these cumulative overlaps with a stationary null model where the true covariance matrix is in . 3 The results are quite striking: whereas in most cases, the null model explains rather well the direction in which risk is over-realised, the top 10% cases are clearly different. For futures, large excess out-of-sample risk is concentrated in the statistical risk modes with the smallest in-sample risk, whereas for stocks, excess risk is in the direction of the statistical modes with the largest in-sample risk (see also the related discussion in [10, 14] .) For futures, risk over-realisation tends to come from the sudden divergence of the spread between tightly correlated contracts, for example associated to the delivery of the same underlying at different expiry dates. The spread between those contracts is typically close to zero, but exogenous shocks might lead such a typically quiet direction to generate anomalous risk along the term structure of the contract. As an example, we observe in Fig. 3 a spike around April 21st, 2020 that corresponds to the days in which the price of crude oil futures has been strongly stressed by a COVID-induced demand shock. Our metrics thus identifies such directions as fleeting modes, since in the presence of tiny in-sample risk directions, even a moderate out-of-sample risk leads to a very strong spike in the top eigenvalue of . In contrast to futures, the absence of strong mechanical correlations between equity instruments leads to a smaller loading of fleeting modes on low risk modes, and a larger loading on high risk modes (industrial sectors and/or equity factors) which tend to overrealize their risk in a systematic fashion. A natural question is whether known factors could be at the origin of such excess risks in equity portfolios. A natural candidate is the momentum factor. Indeed, let us consider the case T in T mom , where T mom is the time-scale used to build the momentum signal. The in-sample risk model is then blind to such a factor, because the directions defined by momentum signal randomly rotate over time and average out when T in T mom . Hence, because of the impact of investors trading in and out [15] , these factors should over-realize their expected risk provided T mom T out . In order to quantify the role of factors (including momentum) in the observed excess risk, we define a metric that measures the alignment of a given factor direction with the subspace spanned by the n largest eigenvectors of (t). More precisely, we define by the normalized factor loadings z f,i (t) on the real assets i = 1, . . . , N (with z f (t) 2 = 1) and similarly rotate the fleeting modes z a (t) back into the real asset basis. We then consider the following overlap Note that 0 ≤ φ n (t) ≤ 1, with φ N (t) ≡ 1, since z a is a complete ortho-normal basis. The null model has the same projection amplitude on the statistical risk modes v µ (t) as momentum, but with randomly scrambled signs. The contribution of the momentum factor to excess risk is clear. Fig. 5 shows the average value of φ n (t) over the whole period, for n ≤ 30 when the factor f = mom is the momentum factor for stocks 4 . We compare this result with a null model that has the same projection amplitude on the statistical risk modes v µ (t) as momentum, but with randomly scrambled signs. This graph clearly shows that a significant portion of the risk over-realization in the equity space can indeed be explained as an exposure to the momentum factor, which is itself buffeted by the price impact of momentum trader. Using Random Matrix Theory, we have provided a universal and versatile tool to analyse the statistical significance and financial origin of risk over-realisation in large portfolios. The eigenvalues and eigenvectors of an appropriately constructed matrix mixing in-sample and out-of-sample data allows one to identify "fleeting modes", i.e. portfolios that carry significant excess risk, signalling (ex-post) a change in the correlation structure in the underlying asset space. Our proposed test is furthermore independent of the "true" underlying correlation structure, which is obviously unknown to the modeler. We have shown empirically that such fleeting modes exist both in futures markets and in equity markets, and analyzed the directions in which excess risk manifests itself. We have proposed a metric to quantify the alignment between known factors and fleeting modes. As a case in point, momentum exposure clearly appears as a source of excess risk in equity portfolios that is not captured by low frequency correlation matrices. A First Course in Random Matrix Theory: For Physicists, Engineers and Data Scientists Honey, I shrunk the sample covariance matrix. The Journal of Portfolio Management Spectrum estimation for large dimensional covariance matrices using random matrix theory Eigenvectors of some large sample covariance matrix ensembles. Probability Theory and Related Fields Cleaning large correlation matrices: tools from random matrix theory The power of (non-) linear shrinking: A review and guide to covariance matrix estimation Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models Principal regression analysis and the index leverage effect Eigenvector dynamics: general theory and some applications Conditional Correlations and Principal Regression Analysis for Futures Identifying states of a financial market Risk diversification: a study of persistence with a filtered correlation-network approach Overlaps between eigenvectors of correlated random matrices Zooming in on equity factor crowding