key: cord-0595589-dtc9klkp authors: Davis, Richard; Ng, Serena title: Time Series Estimation of the Dynamic Effects of Disaster-Type Shock date: 2021-07-14 journal: nan DOI: nan sha: 778715fd329a7570d20f3531dddb61d80fe7def8 doc_id: 595589 cord_uid: dtc9klkp This paper provides three results for SVARs under the assumption that the primitive shocks are mutually independent. First, a framework is proposed to accommodate a disaster-type variable with infinite variance into a SVAR. We show that the least squares estimates of the SVAR are consistent but have non-standard asymptotics. Second, the disaster shock is identified as the component with the largest kurtosis and whose impact effect is negative. An estimator that is robust to infinite variance is used to recover the mutually independent components. Third, an independence test on the residuals pre-whitened by the Choleski decomposition is proposed to test the restrictions imposed on a SVAR. The test can be applied whether the data have fat or thin tails, and to over as well as exactly identified models. Three applications are considered. In the first, the independence test is used to shed light on the conflicting evidence regarding the role of uncertainty in economic fluctuations. In the second, disaster shocks are shown to have short term economic impact arising mostly from feedback dynamics. The third uses the framework to study the dynamic effects of economic shocks post-covid. The novel coronavirus (covid-19) outbreak has drawn attention to the modeling of rare events such as pandemics and natural disasters. How do we estimate the dynamic effects of disaster type shocks on economic variables? How do we estimate the dynamic effects of economic shocks when the data are contaminated by rare events that do not have economic origins? Should measures of disasters be modeled as exogenous? A difficulty in predicting the occurrence of disasters and designing polices to mitigate their impact is that there are few such data points even over a long span. After all, the CDC has only documented four influenza pandemics in the U.S. with deaths in excess of 100,000 over a 120 year period starting in 1900. 1 For natural disasters, the 12,000 deaths from the Galveston hurricane of 1900 remains a record, with the 1200 deaths from Katrina coming in a distant second in terms of casualties. Worldwide, only seven earthquakes since 1500 were larger than 9 in magnitude, 2 and September 11 was the only terror attack on U.S. soil with more than 300 deaths, let alone 3000. Nonetheless, when a rare disaster strikes, it strikes in a ferocious manner as covid-19 reminds us. Though these events have been intensely studied on a case by case basis, it is also of interest to study these events over a long time span. 3 We apply standard time series methodology to analyze the dynamic effects of rare events by modeling these events as being driven by heavy-tailed shocks. To fix ideas, consider Figure 1 which plots the real cost of 258 natural disasters over the period 1980:1-2019:12, augmented to include 9/11. 4 The series is dominated by a few events with Hurricane Katrina in August 2005 being the largest, accounting for 9.2% of total cost. This is followed by the four weeks in the summer of 2017 when Hurricane Harvey contributed 7% in August, while Hurricanes Irma and Maria in September created a combined cost of 8%. These are followed by 9/11 in 2001 and superstorm Sandy in October 2012, each contributing to about 5% of total costs. Another measure of the cost of disasters is the number of lives lost. This series, while not plotted to conserve space, has spikes that are even more extreme. Over the same time period, 49% of disaster-related deaths can be attributed to Hurricanes Maria/Irma, 9/11, and Hurricane Katrina, with the heat wave of 1980 coming in fourth. Both series have features of a heavy-tailed process, and we will subsequently use sample kurtosis as evidence of tail heaviness. Heavy-tailed data pack a lot of information in a few observations. Because of its large variability, 1 These are the Spanish flu in 1918 (675,000 US deaths), the H2N2 virus in 1957-58 (116,000 US deaths), H3N2 virus in 1968 (100,000 US deaths), the H1N1 virus in 2009 (12,500 US deaths). Source https://www.cdc.gov/flu/ pandemic-resources/basics/past-pandemics.html. 2 Source: https://en.wikipedia.org/wiki/Lists_of_earthquakes#Largest_earthquakes_by_magnitude. 3 For a review of methodologies used, see Botzen, Deschenes, and Sanders (2019) . 4 The series combines data from the National Oceanic and Atmospheric Administriation and the Insurance Information Institute as explained in Ludvigson, Ma, and Ng (2021a) . the dynamic effects of disaster shocks should in principle be consistently estimable. Indeed, if all variables in a multivariate system have heavy tails, we show below that the least squares estimator will converge at a fast rate of ( T ln T ) 1/α where α is the index of the heavy-tailed shock and T is the sample size. Though the distribution theory is a bit nonstandard, the regression framework is the same as the standard case when all variables have light tails. But while many macroeconomic time series have excess kurtosis, they do not fit the characterization of heavy tails. For example, unemployment and industrial production have kurtosis of less than 10, while the disaster series shown in Figure 1 has kurtosis in excess of 70, and the estimated tail index of approximately one suggests a distribution with infinite variance and possibly infinite mean. 5 Beare and Toda (2020) analyzed covid-19 cases across US counties and finds that the right tail of the distribution has a Pareto exponent close to one. This motivates a new multivariate framework in which finite and infinite variance shocks co-exist in such a way that the economic variables can be affected by heavy-tailed shocks but not dominated by them. Our point of departure is that the n primitive shocks u = (u 1 , . . . , u n ) are assumed to be mutually independent, a condition stronger than the commonly used assumption of mutual orthogonality that is no longer meaningful when one of the shocks has infinite variance. We develop a HL ('heavy-light') framework in which the coefficient estimates on the infinite variance regressors are consistent at a rate of T 1/α , still faster than the usual rate of √ T . We then show that the disaster shock series can be identified by the magnitude of its kurtosis and the sign of its impact effect. For estimation, we perform an independent components analysis (ICA) based on distance covariance of the pre-whitened data, an approach first suggested in Matteson and Tsay (2017) for finite variance data. Davis and Fernandes (2022) recently showed that the procedure remains valid when a shock has infinite variance provided its mean is finite. Prewhitening by singular value decomposition is often used to remove correlations prior to ICA estimation to focus on the higher order signals. For SVAR applications, prewhitening by Choleski decomposition is more natural since it is already used to identify mutually uncorrelated shocks with a recursive structure. We show that even though the variance of the shocks may not exist, Choleski decomposition of the sample covariance remains valid. Furthermore, we show that ICA will still recover the shocks in spite of sampling uncertainty in the VAR residuals. To assess the restrictions imposed on the SVAR, we apply a permutation-based procedure to the distance covariance statistic as a test for independence that is robust to infinite variance data. It complements other SVAR specification tests made possible by the independence assumption, as discussed below. The rest of the paper is structured as follows. Section 2 summarizes the key properties of 5 The method often used to estimate the tail index is due to Hill (1975) . heavy-tailed linear processes and discusses the implications for VAR estimation. Section 3 presents the HL framework. Consistency and limiting behavior of the least squares estimator for parameters in a VAR are shown. Identification, estimation via distance covariance, and implementation of an independence test are then discussed. Section 4 uses simulations and three applications to assess the properties of the proposed procedure. The appendix contains background material on distance covariance as well as proofs of the main results in Section 3. Disaster events are rare and heavy tails can be a useful characterization of their probabilistic structure. Well known heavy-tailed distributions include the Student-t, F , Fréchet, as well as infinite variance stable and Pareto distributions. where C is a finite and positive constant and P(Z>x) P(|Z|>x) → p ∈ [0, 1] as x → ∞. Examples include the Cauchy and Pareto distributions. The Gaussian distribution has 'thin' tails that decay faster than an exponential and is not included in this class. The results that follow can be extended to a more general condition on F called regular variation in which (1) is replaced by The normalizing constants in such an extension become less explicit so we stick to the Pareto-like tail assumption for tractability. Let d 1T = inf{x : P(|Z 1 | > x) ≤ 1 T } be the (1 − 1 T )-th quantile of F and d 2T = inf{x : P(|Z 0 Z 1 | > x) ≤ T −1 } be the corresponding quantile for the joint distribution of the product Z 0 Z 1 . Distributions with Pareto-like tails have Davis and Resnick (1986) ). The population moments of Z t satisfying (1) are only defined up to order α since It is possible for the population variance to exist but the population kurtosis to be undefined. But even if the population moments do not exist, the sample moments can still have well defined limits. If Z t has a Pareto-like tails with index α ∈ (1, 2), Z 2 t also has Pareto-like tails with index α/2, and it holds that Z t is the sample mean, and for h > 0, S α , S α/2,0 , S α,h are stable random variables with exponents, α, α/2, and α respectively. Their joint distributions can be found in Davis and Resnick (1986) . To gain a sense of the tail properties of the data under investigation, we will make use of the fact that if Z t is an IID Pareto sequence with tail index α = 1, then the sample kurtosis κ 4 has the property (see Cohen, Davis, and Samorodnitsky (2020) ) that The limit of kurtosis, scaled by the sample size, is a random variable between zero and one so the maximum kurtosis that can be observed asymptotically is T . Tabulating the distribution for T = 500 and T = 1000 with α = 1, we see that the quantiles roughly double with T . Based on simulations, the values of these quantiles are an upper bound for α ∈ (1, 2). As a point of reference, the disaster series shown in Figure 1 has T = 480 and kurtosis of around 70, which is in the lower 10-th percentile. 6 The number of deaths series mentioned in the Introduction has kurtosis of 147 and is in the 30-th percentile. In contrast, the kurtosis a typical of macro economic time series is under 10, hence the theory for heavy tails would be inappropriate. A multivariate system of time series with different tail properties thus necessitates a different setup. There is a large literature on robust and quantile estimation of the parameters in a linear model to guard against extreme values which explicitly down-weights outliers. Blattberg and Sargent (1971) and Kadiyala (1972) show that the least squares estimator is unbiased when the error in the regression model is drawn from a general symmetric stable Paretian distribution, but it is not the best linear unbiased estimator. In the Cauchy case when α = 1, the best linear unbiased estimator is y τ /x τ where x τ = max j X j . 7 A different viewpoint, also the one taken in this paper, is that the extreme values are of interest. 8 Under this assumption and fixed regressors, Mikosch and 6 The distribution of Sα can be approximated by simulating j = 1, . . . J times sj,α = M m=1 ( m j=1 ej) −1/α where {ej} is drawn from the exponential distribution. 7 Best here means in terms of minimizing dispersion. 8 See, for example, two special issues on heavy-tailed data, Paolella, Renault, Samorodnitsky, and Varedas (2013) and Dufour and Kims (2014). de Vries (2013) provide a finite sample analysis of the tail probabilities of the single equation CAPM estimates to understand why they vary significantly across reported studies. We are interested in estimating dynamic causal effects in a multivariate setting when the regressors are stochastic, and one of the primitive shocks has heavy tails. Consider n mean zero variables Y t = (Y t1 , . . . , Y tn ) represented by a VAR(p): where A(z) = I n −A 1 z −. . .−A p z p is the matrix-valued AR polynomial. Provided that detA(z) = 0 for all z ∈ C such that |z| ≤ 1, A(z) −1 exists, the moving-average representation of the model is The standard OLS estimator of A is characterized by (see (26) and (27)) These errors e t are mapped to a n × 1 vector of primitive shocks u t = (u t1 , . . . , u tn ) via a (time invariant) matrix B: where u t is usually assumed to be mean zero, mutually and serially uncorrelated and with Σ u = E[u t u t ] being a diagonal matrix. See, for example, Stock and Watson (2015) and Kilian and Lutkepohl (2017) . The reduced form errors e t are usually assumed to have 'light tails' which is possible only if u t has light tails. A model that satisfies these standard assumptions will be referred to as the LL (light-light) hereafter. Under regularity conditions for least squares estimation, is √ T consistent and asymptotically normal. The modeling issues that arise when one of the primitive shocks in a SVAR has infinite variance are best understood in the p = 1 and n = 2 case. Consider first a HH (heavy-heavy) model in which both shocks have heavy tails. Lemma 1 Let {Z t } be an IID sequence of random variables with Pareto-like tails (i.e., equation (1)) with index α ∈ (0, 2) and EZ t = 0 if α > 1. Let If the sequence of constants {ψ j } are such that ∞ j=−∞ |ψ j | δ < ∞ for some δ ∈ (0, α) ∪ [0, 1], then i. the process X t = ∞ j=−∞ ψ j Z t−j exists with probability one and is strictly stationary. ii. Letρ(h) = T −h t=1 X t X t−h / T t=1 X 2 t be the sample autocorrelation at lag h > 0 and suppose that ∞ j=−∞ |j| |ψ j | δ < ∞ for some δ ∈ (0, α) ∪ [0, 1]. Then for α = 1, where (S α/2,0 , S α,h ) are independent stable random variables with indices α/2 and α, respectively. If α > 1, then the latter convergence also holds if EZ t = 0 providedρ(h) is replaced By restricting attention to 0 < α < 2, we only consider processes with infinite variance. Even though X t is not covariance stationary (since E|X t | 2 = ∞), part (i) states that the process X t exists and is strictly stationary. The stated results for the sample covariance and sample autocorrelation are due to Davis and Resnick (1986, Theorem 3 .3) and also hold when X t is centered for α ∈ (1, 2). Note that the convergence ofρ(h) is faster than the √ T rate obtained for finite innovation variance. For VAR estimation, Lemma 1 can be used to show that It then follows from continuous mapping that the least squares estimator is super consistent: Though the analysis is straightforward, this setup is unappealing for macroeconomic data because if u t1 and u t2 both have infinite variance, Y t1 and Y t2 must also have infinite variance. But a typical economic time series does not resemble the series shown in Figure 1 . Not only is the disaster series much less persistent, its kurtosis (over 70) is an order of magnitude larger than for variables like output growth, inflation, and interest rates. Our goal is a model in which (i) a heavy tailed shock u t1 co-exists with light tailed shocks u ti , i = 2, . . . , n, and (ii) Y ti is influenced by the current and past values of u t1 but not dominated by them in a sense to be made precise. We consider the HL (heavy-light) model derived from the SVAR(p) where for each h = 1, . . . , p, A h is a n × n matrix with (i, j)-th entry denoted [A (h) ij ], the coefficient of variable j at lag h in equation i. The entries [B ij ] of the n × n matrix B are similarly defined. i. The sequence of n-dimension random vectors {u t } is iid and the components, u ti , i = 1, . . . , n are also independent. The u t1 will have Pareto-like tails with index 1 < α < 2 and E[u t1 ] = 0, while the remaining shocks u ti , i = 2 . . . , n will have thin tails with mean zero and variance 1. ii The coefficient matrices A h for h = 1, . . . , p and the matrix B will satisfy the following conditions. The primitive shocks u ti are assumed to be independent across i and t but does not preclude time varying second moments, though it is stronger than mutual orthogonality of u t typically assumed in SVAR modeling. Assumption HL (i) restricts attention to processes with tail index 1 < α < 2 and thus excludes Cauchy shocks. The assumption that the thin tailed shocks u ti , i = 2, . . . n have unit variance is without loss of generality, but it is important that their variances are finite. Since the variations of u t1 will dominate those of u ti , i ≥ 2 when both are present, Y t1 will have heavy tails and exhibit the large spikes originating from u t1 . Assumption HL(ii) is motivated by the fact that Y ti cannot have finite variance unless B i1 = 0 and A (h) i1 = 0 for all h. But the dynamic effects of u t1 on Y t+h,i would then be zero at all lags by assumption, rendering the empirical exercise meaningless. Thus, the Y ti equation is modified to dampen the influence of u t1 on Y ti at rate θ given in (8), i1,T and B i1,T to zero is an asymptotic device to obtain this limit, but note that A h i1,T and B i1,T are not time varying. Under assumption HL(ii), Y t,T is a triangular array that depends on T . To simplify notation, the explicit dependence on T is suppressed. A heavy-tailed linear time series must have a heavy-tailed shock as its primary source of variation, but it need not be exogenous. In our model, exogeneity would require that A 1j = B 1j = 0, j = 2, . . . , n, in which case, any feedback from Y ti , i ≥ 2 to Y t1 would be disabled. But such a model would not shed light on how macroeconomic outcomes might mitigate or amplify the effects of disasters. Assumption HL allows A (h) 1j and B 1j , j = 2, . . . n to be free parameters to be estimated. Specializing to the n = 2 and p = 1 case with Eu 2 t1 = ∞ and Eu 2 t2 < ∞ we show in the Appendix that the following holds under Assumption HL where the limits have a stable distribution with index α and α/2, respectively. Thus the sample first and second moments of Y t2 have (random and possibly constant) limits even though one of its shocks has infinite variance. The implications for least squares estimation of the HL model can be summarized as follows. Proposition 1 Suppose that the data are generated by (5), and for tractability assume that n = The convergence rate for 11 is min( T log T 1/α , T 1/2 ), which is √ T . This is slower than the rate for A 11 in the HH model because one of the infinite variance regressors in the HH model is replaced by one that has finite variance. The convergence rate for 12 can be written as √ T T −θ which is slower than the √ T rate for 12 in the LL model because the variations in this equation are dominated by those from lags of Y t1 , hampering identification of A 12 . Now the convergence rate for 21 can be written as √ T T θ which is faster than the √ T rate obtained for 21 in the LL model. This implies Hence in the HL model, the local parameter a 21 is consistently estimable. In each case, the limit distribution is non-standard and not pivotal, so that construction of asymptotically correct confidence intervals is intractable. Since VAR estimates are obtained from least squares regressions on an equation by equation basis, Proposition 1 sheds light on the more general setting when a regressor has infinite variance, but the dependent variable has finite variance. Though such a regression would be 'imbalanced' in the standard setup, the coefficients on the heavy-tailed variable is being scaled down to accommodate the heavy-tailed shock in our HL setup. The coefficient estimate on the infinite variance regressor would be consistent but not asymptotically normal. By implication, the impulse response coefficients whether computed from the VAR or by local projections would likely not be asymptotically normal. The structural moving-average representation of the model is where Ψ(L) = A(L) −1 B and Ψ 0 = B. The effects of u t1 on Y t+h,2 are given by the first column of Ψ h which depends on A and B. Hence to estimate the dynamic causal effects of u t1 , we need to be able to consistently estimate B when u 1t has infinite variance. The relationship between the vector of primitive shocks u and error terms is where u t = (u t1 , . . . , u tn ) is an n-vector consisting of independent random variables with mean zero and B is an n × n matrix with inverse W . As is well known, B is not uniquely identified from the second moments of e t alone even when e t has finite variance because BQ Qu t has the same covariance structure as Bu t for any orthonormal matrix Q. Lemma 2 Let e = Bu, where u is a n × 1 vector of mutually independent components of which at most one is Gaussian and B is an n×n invertible matrix with inverse W = B −1 . If the components ofû =Ŵ e are pairwise independent, whereŴ is an invertible matrix, thenŴ = P ΛW where P is a permutation matrix and Λ is a diagonal matrix. Further, the components ofû must be mutually independent. Proof. The proof of this result follows directly from Skitovich-Darmois theorem as described in the proof of Theorem 10 in Comon (1994) . Sinceû =Ŵ Bu =: Gu, the components ofû i can be written asû The independence of theû i andû j components implies that G i,1 G j,1 = 0 for i = j. That is, the first column of G contains at most one nonzero value. A similar conclusion holds for all the columns of G. Hence G is product of a permutation matrix P times a diagonal matrix Λ = diag{λ 1 , . . . , λ n }, i.e., G = P Λ. In other words,Ŵ W −1 = P Λ orŴ = P ΛW as was to be shown. It follows by the form of G that the components ofû must be mutually independent. Independence of u narrows the class of observational equivalent models to those characterized by permutations of rows and changes of scale/sign. As discussed in Gouriéroux, Monfort, and Renne (2017), scale changes are responsible for failure of local identification, a problem that can be dealt with by normalizing the shocks so that Λ is an identity matrix. Failure of global identification arising from permutation and sign changes require additional assumptions. It is only when the restrictions are correctly imposed that P is also an identity matrix, in which case,Ŵ = W . We also need to impose restrictions on W to identify a component of u as a disaster shock. Our problem is non-standard because the shock of interest has a heavy tail, but this distinctive feature actually helps identification. We reorder the components by their tail-heaviness, and take the disaster shock to be the first component, which is also the one with the largest kurtosis. In practice, the variables in the estimated u will be ordered by sample kurtosis. As seen in (4), this ordering is consistent with ordering the components ofû by tail-heaviness. Independent components analysis (ICA) is widely used to identify a linear mixture of non-Gaussian signals. Whereas PCA uses the sample covariance to find uncorrelated signals, ICA typically uses properties of the random vector that go beyond second moment properties in order to separate the independent signals. 10 In the ICA literature, B is known as the mixing matrix and W the unmixing matrix. ICA has been applied to finite variance SVARs in which global identification is achieved by imposing additional restrictions such as lower triangularity of B. 11 There exist many ICA estimators for identifying the source process, which in our case corresponds to the primitive shocks u. Some procedures evaluate negative entropy (also known as negentropy) and take as solution the W that maximizes non-Gaussianity of W e t , while others maximize an approximate likelihood using, for example, log-concave densities. The popular fast ICA algorithm of Hyvaninen, Karhunen, and Oja (2001) is a fixed-point algorithm for pseudo maximum-likelihood estimation. A different class of procedures take as the starting point that if the signals are mutually independent at any given t, their joint density, if it exists, factorizes into the product of their marginals. This suggests to evaluate the distance between the joint density and the product of the marginals. 12 Chen and Bickel (2006) form a distance measure between the joint characteristic function and the product of the marginal characteristic functions to estimate the unmixing matrix. The advantage of this procedure is that it does rely on existence of joint densities or moments. In case the vector has finite second moments, they obtain a convergence rate of 1/ √ T for this nonparametric estimate of W , the same as the one obtained in Gouriéroux, Monfort, and Renne (2017) for parametric estimation. Matteson and Tsay (2017) use a distance covariance approach to extract the independent sources under the assumption that they have finite variances, which is similar in spirit to the method of Chen and Bickel (2006) . (2006) we assume that the parameter space of unmixing matrices is given by Ω consisting of invertible matrices W for which a) each of its rows has norm 1; b) the element with maximal modulus in each row is positive; c) the rows are ordered by ≺; for Further it is assumed that the true unmixing matrix W 0 ∈ Ω. However, we will reorder the rows of the estimated W according to largest sample kurtosis. The disaster shock with infinite variance will correspond to the first row of W . We will also use the distance covariance approach because as shown in the companion paper Davis and Fernandes (2022) , it is also valid when one component of u has infinite variance. The distance covariance between two random vectors X and Y of dimensions m and n, respectively, is where w(s, t) > 0 is a weight function and ϕ Z (t) = E[exp i(t,Z) ], t ∈ R d denotes the characteristic function for any random vector Z ∈ R d . The most commonly used weight function, which we will also adopt here, is where β ∈ (0, 2), c m,β = 2πm/2Γ(1−β/2) β2 β Γ((β+m)/2) (see Székely, Rizzo, and Bakirov (2007) ). The integral in (10) is then finite provided E|X| β + E|Y | β < ∞. Under this moment assumption, one sees immediately that X and Y are independent if and only if I(X, Y ; w) = 0 since in this case the joint characteristic function factors into the product of the respective marginal characteristic functions, ϕ X,Y (s, t) = ϕ X (s) ϕ Y (t) for all (s, t) ∈ R m+n . Based on data (X 1 , Y 1 ) . . . , (X T , Y T ) from (X, Y ), the general distance covariance in (10) can be estimated by replacing the characteristic functions with their Using the w given in (11) and assuming E|X| β |Y | β < ∞, there is an explicit formula forÎ (see (20)) that avoids direct computation of the associated integral. Additional background on distance covariance can be found in the Appendix. with weight function given by (11). Based on a sample e t = (e t1 , . . . , e tn ) , t = 1, . . . , T , an estimate of the unmixing matrix W is found by minimizing the objective function, I M T (W ) :=Î(S 1 , S 2:n ) +Î(S 2 , S 3:n ) + · · · +Î(S n−1 , S n−1:n ) , subject to W ∈ Ω and whereÎ is the empirical estimate of I using S t = W e t , t = 1, . . . , T . (2017) show that procedure produces a consistent estimate of W when the variance of the S t is finite. The proof is based on rewriting I(·) in terms of V statistics and presumes that terms of the form E|XY | are finite. In our case of infinite variance, I(X, Y ) is finite even if E|XY | = ∞. One only needs that E|X| + E|Y | < ∞. More recently, it is shown in Davis and Fernandes (2022) that consistency of W based on the sample distance covariance also holds in the infinite variance case. This result justifies the use of the objective functionÎ M T (W ) for estimating the unmixing matrix in the finite mean but infinite variance case. In case the mean is infinite, one can choose a β < 1 in the weight function to ensure that the moment condition E|e t | β < ∞ is met. In most ICA estimation procedures, the first step is typically to prewhiten the output. In effect, prewhitening removes second moment correlations prior to estimating the independent components. In the context of a SVAR with finite variance, suppose we have the observations e 1 , . . . , e T , from the model e t = Bu t . Denote the sample covariance matrix of the e t 's byΣ (13), e.g., where the minimization is over all O ∈ O(n), the space of n-dimensional orthogonal matrices. This produces an estimate of the unmixing matrix given byÔΣ −1/2 e that is a consistent estimate of W 0 after suitable rescaling and row permutation as noted in Remark 1. The optimization over orthogonal matrices reduces the number of unknowns from n 2 to n(n − 1)/2. The fact that this prewhitening step actually works in the infinite variance case follows directly from Theorem 3.2 in Davis and Fernandes (2022) (see also Chen and Bickel (2006) ), which we record in the following proposition. Proposition 2 Consider observations e t , t = 1, . . . , T, from the ICA model (9) where W = W 0 ∈ Ω is the true unmixing matrix, and that the components of u t are mutually independent, at most one has infinite variance, at most one component is normal and none of the components are degenerate. Then, settingŴ = [ÔΣ −1/2 e ] Ω , the rescaled and row permuted version ofÔΣ −1/2 e ∈ Ω, we havê Although we have used the SVD version ofΣ −1/2 e in Proposition 2, we could also use the Choleski analogue, which is often an attractive alternative. This is especially true for SVARs since it is already widely used to identify a lower triangular structure of B. Though the population covariance matrix of e t does not exist in the infinite variance case, a decomposition of the sample covariance matrix is possible. The following result gives the decomposition for the n = 2 case. The proof of the lemma is given Section 7.2 of the Appendix. The prewhitened variables e c t remain a function of u t1 and u t2 which we seek to identify. Observe that if B were lower triangular, e c t1 will only depend on u t1 since e t1 = B 11 u t1 + B 12 u t2 . But note that Choleski decomposition is used here only as a prewhitening device and not as a way to achieve identification. If the ordering is incorrect, ICA will undo the ordering to find the u satisfying the additional identification restrictions. In practice, of course, we do not observe the residuals e t directly but rather the estimated versionsê t = (ê t1 , . . . ,ê tn ) T , t = 1, . . . , T . Limit distributions of the distance covariance function based on the residuals can be slightly different when applied to theê t than the actual residuals (see Davis, Matsui, Mikosch, and Wan (2018) ). Interestingly, in the heavy-tailed case, the limit theory for the distance covariance based on estimated and actual residuals is the same. In the context of consistency in the estimation of the unmixing matrix, the same procedure can be carried out as above using estimated residualsê t = e t + (A −Â)Y t−1 . The proof is given in the appendix. The idea is that the sample residuals can be represented by an ICA model with noise, i.e.,ê t = Bu t + v t where the noise is the sampling error v t = (A −Â(T ))Y t−1 . It is then shown that the difference betweenΣ −1 e (from noiseless model) andΣ −1 e (from noisy model) converges to zero in probability and thus has asymptotically negligible effects on the objective function that estimates W . Applying Theorem 3.3 in Davis and Fernandes (2022) for ICA with noise gives the stated result. The dynamic properties of a SVAR are determined by restrictions imposed on the model and generally difficult to test. But if u t is independent, then by Lemma 2, the identifying restrictions are testable. Lanne, Meitz, and Saikkonen (2017) We also test independence ofû, but independence ofê is also of interest because if the components of e t = Bu t were already independent, then by Lemma 2, B would be diagonal and no further analysis on the structure of W would be required. A independence test ofê is thus informative about its unrestricted structure. In contrast, independence ofû is informative about the structure implied by identifying restrictions. Ifû t should fail an independence test, there would be no point in further analyzing the impulse responses. As reviewed in Josse and Holmes (2016) A test using the empirical version of the aggregated distance covariance I M T defined in (13) can in principle be used. Even though TÎ M T has a limit distribution under the null hypothesis of independent components, the limit distribution is generally intractable. Hence direct use of the limit distribution for calculating cutoff values for the test statistic is infeasible. However, as pointed out in Matteson and Tsay (2017) , one can use a test by calculating the test statisticÎ M T for random permutations of the data. A permutation-based test for independence is founded on the idea that if there is dependence in the components, then the value ofÎ M T should be larger than the corresponding statistics based on random permutations of the components, in which the dependence among the components has been severed by the permutation. The test is known to control Type I error and also robust to the possibility of heavy tails. Precisely, if S 1 , . . . , S T is an iid sample of random vectors of dimension n, then the permutation procedure is implemented via the following steps. For b = 1, . . . N P , The test is distribution free under the null hypothesis. The p value of the test is constructed as where k is the number ofÎ M T 's from the N P permuted samples that exceedÎ M T . The test is implemented in the R-package steadyICA with a default N P value of 199. We reject independence of the components in S if the p−value is less than a prescribed nominal size. In principle, the null hypothesis of independence can be rejected because u t is not independent, or because the identifying restrictions are incorrect, or both. But under the maintained assumption that the components of u are mutually independent, the test provides a validation of the (overidentified or exactly identified) restrictions on B (or W ). The dynamic effects of a disaster shock can be analyzed as follows. Step 1 estimates the coefficients of a VAR model using least squares. Step 2 prewhitens the VAR residuals. Step 3 applies ICA to obtain independent components and associates the component with the largest kurtosis as the disaster shock. Step 4 estimates the impulse response functions. Their dynamic effects after h From a given B init that is either NLT or LT, its inverse yields a non-normalized W init from which a normalized W is formed by imposing the constraint that each row sums to one. Then B = W −1 is used to simulate data and subsequently estimated. Innovations u The first innovation specification (denoted HL) has one heavy-tailed shock while in the second specification (denoted LL), all three shocks have light tails. In both cases, the shocks are ordered such that u 1 has the largest kurtosis and u 3 has the smallest. Prewritening: Let e 0 = (e 1 , e 2 , e 3 ) and e 1 = (e 2 , e 3 , e 1 ) denote two assumed orderings with estimated covariances cov(e 0 ) and cov(e 1 ) based on samples from each vector, respectively. i. e 0 = e 0 P −1 0 , P 0 = chol( cov(e 0 )). ii. e 1 = e 1 P −1 1 ,P 1 = chol( cov(e 1 )). iii. e 2 = e 0 P −1 svd , P svd = U D 1/2 U , svd( cov(e 0 )) = U DU . panel assume that e t is observed. Regardless of the specification for u t and B, the Type I errors associated with u t or u 2 t are close to the size of the test. However, since the components of u t are non-Gaussian by construction, the test always rejects independence of e 0 t . Recall that e 0 t are constructed from a Choleski decomposition of the sample covariance matrix for e 0 t . Independence of e 0 t is always rejected when data are generated from Model NLT but is almost never rejected for model LT because W is lower triangular in model LT. The prewhitened data e 1 t , e 2 t and e 3 t are based onŴ matrices that differ from W and hence the test also rejects independence. The top right panel shows that the permutation test does not reject independence of the signalsû( e) recovered by ICA except in Model LT-HL when the test rejects with probability 0.145 in the Monte-Carlo, which is slightly oversized. The above results assume that e 0 is observed. Next, we replace e 0 with residuals from estimation of a VAR with one lag. ICA is then applied to the estimated residuals after prewhitening. Panel B of Table 1 shows that the rejection probabilities of the permutation test are not affected by having to estimate A and B by least squares. As in the case when e is observed, the permutation test cannot reject independence of the primitive shocks identified by ICA except in the LT-HL case when the rejection probability is 0.139. A metric for comparing matrices is Amari distance which, for two p × p matrices A 0 and A with Bach and Jordan (2001) as Though ICA studies usually report the Amari distance for the unmixing matrix W , the matrix B is of more interest in SVAR since it is gives impact response of the shocks. We compare the absolute value of two matrices to ensure that differences are not due to a sign flip that is difficult to control in simulations. Panel C of Table 1 shows that all prewhitening methods give similar Amari distances except in Model LT-HL when usingê 0 gives noticeably smaller errors. The results suggest that the method of prewhitening matters, but only in the LT-HL case, and there are two possible explanations. One is that in the LT-HL case the true B (hence W ) is lower triangular, and when this structure is accompanied by a heavy-tailed shock, much can be learned from a kurtosis ordering of the VAR residuals. Prewhitening without using this information is inefficient. The second explanation is that as seen from Panel B, independence of e 0 cannot be rejected. This suggests that it is desirable to use prewhitened data that are as close to independent as possible for ICA estimation. Comparing the p value of the permutation test applied to different sets of prewhitened data can be useful in this regard. The results thus favor prewhitening the VAR residuals by Choleski decomposition ordered by kurtosis. A closer look finds that the A and B matrices are precisely estimated usingê 0 as prewhitened data. Even without imposing a lower triangular structure, the pattern is recovered precisely whether or not the innovations have heavy tails. The difference compared to Choleski decomposition is that ICA lets the data speak as to whether the upper triangular entries of B are zero. If the lower triangular structure is true, Y 1 is exogenous and one can alternatively estimate the the dynamic causal effects from a regression of Y 2 on Y 1 and lags of Y 1 , Y 2 , Y 3 . We consider three applications. The first aims to show that the validity of ordering used in Choleski can be tested, as suggested by Lemma 2. The second application estimates an HL model to shed light on the dynamic effects of a disaster shock. In the third, HL regressions are used to purge the variations due to covid-19 from the data. Economic theory is inconclusive as to whether episodes of heightened uncertainty during economic downturns arise because of exogenous increases in uncertainty, or if they are the consequence of endogenous responses to other economic shocks. SVARs have been estimated using a variety of identification strategies using different measures of uncertainty and over different samples. But testing the validity of these restrictions has been difficult as these models are often exactly identified, i.e., the number of unique entries in the covariance matrix for e t equals the number of free parameters in B. An independence test provides a way to test these restrictions. We take industrial production (IP) as indicator of real activity and consider six different measures of uncertainty used in Ludvigson, Ma, and Ng (2021b) . These are JLN macro uncertainty (UM), real economic uncertainty (UR), financial uncertainty (UF), policy uncertainty (EPU), newsbased uncertainty (EPN), and stock market volatility (VIX). This leads to estimation of six threevariable SVARs, each using six lags, over the sample 1960:7-2015:4. Table 2 shows that the data used in the six systems have different statistical properties. However, there is little evidence that the systems considered have heavy tails. Note: ip is industrial production. Six measures of uncertainty are considered: macro (um), financial (uf), policy uncertainty (epu), news uncertainty (epn), and stock market volatility index (vix). See Jurado, Ludvigson, and Ng (2015) and Ludvigson, Ma, and Ng (2021b) for definitions. We test independence of the identified shocks obtained from different orderings of the VAR residuals. Recall that the p value indicates the Type 1 error in rejecting the assumed lower triangular structure. The p values reported in Table 2 indicate strong evidence against independence of the shocks constructed from Models 2,4,5 regardless of ordering. There is some support for independence when financial uncertainty is ordered first in Models 1 and 6, while the strongest evidence for independence is provided by Model 1 using the ordering (ip,uf,um) , a configuration that would not be obvious based on economic reasoning. As Lemma 2 indicates, independence is necessary but not sufficient for model identification. Nonetheless, testing independence ofû provides a way to rule out incorrect restrictions. The finding that independence of shocks from multiple orderings cannot be rejected suggests that the restrictions imposed by the Choleski orderings are not enough to uniquely identify u. This lends support to using restrictions beyond the ordering of variables to help identification. The second example considers a SVAR in the cost of disasters series (CD) shown in Figure Since the estimates have non-standard distribution, we use (*) to indicate that zero is outside the (10, 90) percentiles of the bootstrap distribution. The matrix A 1 gives the lag one response to a disaster shock. The estimates indicate that response of CD and UM are both non-zero. The A 13 estimate suggests that the costly disaster series is not strictly exogenous. The matrix A(1) = p j=1 j summarizes the cumulative effects of the shocks over six periods. The (1,1)-th diagonal entry ofÂ(1) indicates that the disaster shock has a short half-life. The B matrix gives the instantaneous effect of the disaster shock. The unconstrained ICA estimate is quite close to the one implied by Choleski decomposition with a (1,2,3) ordering. Taking sampling uncertainty into account, ICA supports a B matrix that is more sparse than the lower triangular structure imposed by Choleski decomposition. Note also that the HL model is based on the premise that the effects of an infinite variance shock on a finite variance variable are small. Rows two and three of the first column of B and A k are small relative to the own effect recorded in the (1,1)-th entry of the respective matrices. The estimates are consistent with the HL structure. Figure 2 shows that the shock has a heavy right tail. We estimate the impulse response functions by (i) iterating A h B as implied by the VAR, (ii) local projections usingû obtained from ICA as shocks, and (iii) dynamic responses as implied by Choleski decomposition. These are labeled var, lp, and chol in Table 3 . To provide some idea of precision of the estimated impulse responses, we report standard errors for the lp estimates as well 95% bootstrap confidence intervals using the vars package in R as rough guides. But note that our estimates have non-standard distributions and results about bootstrap inference with heavy-tailed variables have only been considered in the univariate setting, see, for example, Davis and Wu (1997) and Wan and Davis (2022) . Each local projections regression is of the HL type and hence inference is also non-standard. The standard errors should be interpreted with this caveat in mind. The CD series has short memory and the effects of its own shock die out after one month. The shock induces a tightly estimated increase in uncertainty for three months and an increase in unemployment claims of two months. As a point of reference, an unemployment claims shock has an impact effect of 14.556 on itself, and an uncertainty shock has an impact effect of 8.898 on itself. The effects of a disaster shock on these economic variables are small, but they do exist. This reinforces the motivation of the HL model that infinite variance shocks can affect variables with finite variances. It can be argued that the infinite variance nature of u t1 makes the unit variance property of shocks identified by ICA unappealing. But it is easy to calibrate the shock to yield exactly a one percent change to the variable of interest 13 without changing the shape of the impulse response function. With this data, the unit effect is associated with a shock of size 13.881, which is slightly larger than Katrina shock in 2005, which was of magnitude 11.56. covid-19 has been costly in health, social, and economic dimensions, but it has also created new challenges for data analysis. One problem discussed in Ng (2021) is that covid-19 is pervasive and persistent, and the principal components of economic data will be now spanned by common economic variations and covid-19. To isolate the economic factors, a suggestion was made to project each economic variable on covid indicators such as positivity rate, hospitalization, and deaths, then use the panel of 'de-covid' data to estimate the economic factors. covid-19 also has implications for VAR estimation. Consider a two variable VAR in log payrollemployment (PAYEMS) and log consumption of durables (CD). The top panel of Figure 3 shows the response to a positive employment shock from a VAR estimated over the pre-covid sample of 1960:1-2020:2, while the second panel extends the sample to 2020:12. Adding ten months of postcovid data completely changed the shape of the impulse response functions. Lenza and Primiceri (2021) recognize that the covid-induced spikes in the data will distort VAR estimation and suggest to use Pareto priors for innovation variances to capture these spikes. Others such as Carriero, Clark, Marcellino, and Mertens (2021) model covid-19 as outliers. Instead of specifying changes to the probability distribution of existing shocks, an alternative is to assume as in Ng (2021) that there is an additional 'virus' shock, say, v in the post-covid sample. There are then two ways to proceed. The first is to de-covid all variables used in the VAR which would entail running n(p + 1) decovid regressions. By Frish-Waugh arguments, this is the same as adding covid indicators as exogenous variables to each equation. Note that these are not the same as running a VAR on n de-covid variables, which would only entail n decovid regressions. Results using the log changes in positive cases as v are shown in the third panel of Figure 3 . 14 The dynamic responses are very similar to the ones in the top panel estimated on the pre-covid sample. Removing the covid variations from the data before VAR estimation suppresses feedback from the economic variables to v which could be restrictive. An alternative approach is to include a v indicator in the VAR directly and order it first, resulting in a HL model with (n + 1) variables. In this case, interest is not in the dynamic effects of an infinite variance shock; but to isolate the economic variations so that the dynamic effects of economic shocks can be estimated in spite of the presence of covid-19. The results for the three variable VAR in the bottom panel of Figure 3 are again similar to the two step approach in the third panel. Whichever way we choose to control for covid variations, the exercise involves regressions with a finite variance variable on the left hand side and a heavy-tailed variable on the right hand side, and Proposition 1 is relevant to the interpretation of the estimates. This paper provides a VAR framework that accommodates disaster-type events. The framework can be used to study the effects of disaster type shocks, as well as the effects of finite variance shocks in the presence of large rare events. Under the maintained assumption that the primitive shocks are independent, a disaster-type shock can be uniquely identified from the tail behavior and sign of the components estimated by ICA. An independence test for validity of the identifying restriction is also proposed. The test is valid even for exactly identified models and is of interest in its own right. The focus here is developing the HL framework and consistent estimation. Inference when the data have heavy tails remains an area for future research. Bivariate VAR, Purging Covid Effects Three variable VAR, PostCovid a probability density function, then application of Fubini requires no further conditions on the distributions of X and Y . In order to avoid direct integration in (16), one can choose functions, w i which have an easily computable Fourier transform. Examples include the Gaussian density for which w i (x) = exp{−σ 2 x 2 /2} or a Cauchy density in which case w i (x) = exp{−σ x 1 }, where x 1 is the 1-norm. A popular choice for w is w(s, t) = c m,β |s| β+m c n,β |t| β+n −1 , where β ∈ (0, 2), c m,β = 2πm/2Γ(1−β/2) β2 β Γ((β+m)/2) (see Székely, Rizzo, and Bakirov (2007) ). In this case, one has m R c −1 m,β (1 − cos(s, x)) ds = |x| β , and provided E(|X| β + E|Y | β + |X| β |Y | β ) < ∞, then Notice that with this choice of w, I is invariant under orthogonal transformations on X and Y and is scale homogeneous under positive scaling. The most common choice for β is the value 1, which requires a finite mean. In our heavy-tailed framework, we have assumed the tail-index α ∈ (1, 2) so the integral in (16) is finite and formula (19) is valid (if X and Y are independent) using the above weight function w with β = 1. However, in order to extend the results to heavier tails, such as Cauchy, then one can choose a smaller β, which is difficult to identify in practice, or use the the Gaussian density function. As noted in Davis, Matsui, Mikosch, and Wan (2018) , the weight function in (11) can have potential limitations when applied to estimated residuals in the finite variance case. Based on data (X 1 , Y 1 ), . . . , (X T , Y T ) from (X, Y ), the general distance covariance in (16) can be estimated by replacing the characteristic function with their empirical counterparts. Using the w given in (18), we obtain the estimatê which can be shown to be consistent for I(X, Y ; w) by the ergodic theorem applied to the empirical characteristic function. The limit theory for TÎ(X, Y ; w), under the assumption that X and Y are independent can be found in Székely, Rizzo, and Bakirov (2007) in the iid case and in in Davis, Matsui, Mikosch, and Wan (2018) a time series setting when {(X t , Y t )}) is a stationary time series. The latter also considers the limit theory of √ T Î (X, Y ; w) − I(X, Y ; w) , when X and Y are not independent. We now consider an array of models given by Y t,T = A T Y t−1,T + B T u t , where (i) A 21,T = a 21 T θ and B 21,T = b 21 T θ , with θ = 1/α − 1/2. (ii) u t1 has Pareto-like tails, E[u t1 ] = 0 if it exists and has dispersion 1 so that T P(|u t1 | > T 1/α ) → 1 and u t2 ∼ (0, 1). In other words, for fixed T , the time series {Y t,T , t ∈ Z} satisfies the VAR(1) equations with coefficient matrix A T . The time series of observations, Y 1 , . . . , Y T are then considered to come from this model. To lighten the notation going forward, we will often suppress the dependence of Y t on T . Claim 1. We will first show that the assumptions imply that 1 where S uu is a stable random variable with index α/2. The last line follows essentially from Lemma 1, which shows that for j = k, 1 T 1/α log T T t=1 u t−j,1 u t−k,1 = O p (1) (23) 1 T T t=1 u t−j,2 u t−k,2 = o p (1) by the ergodic theorem. Using the ideas in Davis and Resnick (1986) and the continuous mapping theorem, it is straightforward to obtain the limit in the last line upon summing out j and k. We first note that the OLS estimate of A is given bŷ and hence (For simplicity, we have terminated the second sum in (26) and (27) at T − 1 instead of T , but this has no bearing on the asymptotics.) We begin by analyzing the terms in the matrix of cross products given by T −1 t=1 Y t Y t . From (22) The second converges to in probability by (32) and the fact that he determinant goes to infinity in probability. To show the first term converges to 0, since |Σ e |/|Σê| p → 1, it suffices to show that the matrixΣ −1 e remains bounded. ButΣ Using (34), we see that T 2/α−1 B 2 11 S 11,T , which gives the asserted representation for e c t1 . Also from (35) and (36), we find that Since c T p → 0, we conclude as claimed. Moment Tests of Independent Components Tests of Short Memory with Thick-Tailed Errors Kernel Independent Component Analysis On the emergence of a power law in the distribution of COVID-19 cases Regression with Non-Gaussian Stable Distributions: Some Sampling Results The Economic Impacts of Natural Disasters: A Review of Models and Empirical Studies Addressing COVID-19 Outliers in BVARs with Stochastic Volatility Efficient Independent Component Analysis Heavy-tailed distributions, correlations, kurtosis and Taylor's Law of fluctuation scaling Independent Component Analysis: A New Concept Independent Component Analysis with Heavy Tails using Distance Covariance Applications of distance correlation to time series Limit Theory for the Sample Correlation Function of Moving Averages Bootstrapping M-estimates in regression and autoregression with infinite variance Heavy-Tails and Paretian Distributions in Econometrics Charcteristic-Function Based Independent Component Analysis Statistical Inference for Independent Component Analysis: Application to Structural VAR Models Independent Component Analysis Through Product Density Estimation Long-run Neutrality of Demand Shocks: Revisiting Blanchard and Quah (1989) with Independent Structural Shocks A Simple General Approach to Inference about the Tail of a Distribution Independent Component Analysis Independent Component Analyesis Algorithms and Applications Independent Component Analaysis: Recent Advances Estimation of a Stuctural Vector-Autoregression Model Using Non-Gaussianity Estimation and Inference of Impulse Responses by Local Projections Measuring Multivariate Assocaition and Beyond Measuring Uncertainty Regression with Non-Gaussian Stable Disturbances: Some Sampling Results Structural Vector Autoregressive Analysis Identification and Estimation of Non-Gaussian Structural Autoregressions How to Estimate a VAR after Uncertainty and Business Cycles: Exogenous Impulse or Endogenous Response? Independent Component Analysis via Distance Covariance Identification of Independent Structural Shocks in the Presence of Multiple Gaussian Components Heavy Tails of OLS Causal Inference by Independent Component Analysis: Theory and Applications SVAR Identification from Higher Moments: Has the Simultaneous Causality Problem Been Solved Low Frequency Econometrics Modeling Macroeconomic Variations After COVID-19 Latest Developmentes on Heavy-Tailed Distributions Independent Component Analysis Via Nonparametric Maximimum Likelhood Estimation Factor Models for Macroeconomics Measuring and Testing Dependence by Correlation of Distances Goodness-of-Fit Testing for Time Series Models via Distance Covariance The distance covariance between two random vectors X and Y of dimensions m and n, respectively, is given bywhere w(s, t) > 0 is a weight function and ϕ Z (t) = E[exp i(t,Z) ], t ∈ R d denotes the characteristic function for any random vector Z ∈ R d . It is assumed that the integral in (10) is finite, which certainly holds if w(s, t) is a probability density function. One sees immediately that X and Y are independent if and only if I(X, Y ; w) = 0 since in this case the joint characteristic function factors into the product of the respective marginal characteristic functions, ϕ X,Y (s, t) = ϕ X (s) ϕ Y (t) for all (s, t) ∈ R m+n . Now if the weight function factors into a product function, i.e., w(s, t) = w 1 (s)w 2 (t), then under suitable moment conditions on X and Y , I(X, Y, w) has the formwhereŵ 1 (x) = R m e i(s,x) w 1 (s)ds,ŵ 2 (y) = R n e i(t,y) w 1 (s)ds, and (X, Y ), (X , Y ), (X , Y ) are iid copies of (X, Y ). This relation is found by expanding the square in the integrand in (16) and using Fubini to interchange integration with expectation. Precise conditions on w to perform these operations is given in Davis, Matsui, Mikosch, and Wan (2018) . Suffice it to say that if w i is 1 detThe inverse can then be written more concisely asNextEstimation of the Y 2 equation Using the representation in (20), it is relatively straightforward to showwhere S Y u,12 , S uu,21 are stable with index α and N u2 is normally distributed. Summarizing, we have, using an obvious notation,Since −1/α + 1/2 < 0 and 1 − 1/α − 1/2 < 0, we conclude The proof relies on an application of Theorem 3.3 in Davis and Fernandes (2022) , which considers consistency for the unmixing matrix in an ICA model with noise. Observe thatwhere r(T ) = A − = o P (1). By the independence of u t with Y t−1 it follows that the components of E|u t ||Y t−1 | < ∞. Hence, in order to apply Theorem 3.3, it suffices to show thatWe first showΣ e −ΣêFrom (31),whereΣ y is the sample covariance matrix of Y t−1 , t = 1, . . . , T . Using the relations for − A in Proposition 1, and the calculations leading to the limit in (28), it follows thatSimilarly, applying (29)-(30), it is straightforward to also show thatwhich proves (33). To finish the proof, we note the following relations 1 T T t=1 e 2 t1 = T 2/α−1 B 2 11 S 11,T + B 2 12 σ 2 2 + o P (1)1 T T t=1 e 2 t2 = b 2 21 S 11,T + B 2 22 σ 2 2 + o P (1)1 T T t=1 e t1 e t2 = T 1/α−1/2 B 11 b 21 S 11,T + B 11 B 22 σ 2 2 + o P (1) ,where S 11,T = T −2/α T t=1 u 2 t1 = O P (1), and σ 2 2 = var(u t2 ). In view of (33), the exact same relations hold for the corresponding entries ofΣê. The determinant of both sample covariance matrices is then of order |Σ e | = T 2/α−1 B 2 11 S 11,T B 22 σ 2 2 + O P (T 1/α−1/2 ) .