key: cord-0157410-zv3as9ul authors: Wu, Weichi; Zhou, Zhou title: Adaptive Estimation for Non-stationary Factor Models And A Test for Static Factor Loadings date: 2020-12-29 journal: nan DOI: nan sha: ef3bc5dac551e9fff6ec07c60335cd45fb7bccf6 doc_id: 157410 cord_uid: zv3as9ul This paper considers the estimation and testing of a class of high-dimensional non-stationary time series factor models with evolutionary temporal dynamics. In particular, the entries and the dimension of the factor loading matrix are allowed to vary with time while the factors and the idiosyncratic noise components are locally stationary. We propose an adaptive sieve estimator for the span of the varying loading matrix and the locally stationary factor processes. A uniformly consistent estimator of the effective number of factors is investigated via eigenanalysis of a non-negative definite time-varying matrix. A high-dimensional bootstrap-assisted test for the hypothesis of static factor loadings is proposed by comparing the kernels of the covariance matrices of the whole time series with their local counterparts. We examine our estimator and test via simulation studies and real data analysis. Technology advancement has made it easy to record simultaneously a large number of stochastic processes of interest over a relatively long period of time where the underlying data generating mechanisms of the processes are likely to evolve over the long observation time span. As a result both high dimensional time series analysis (Wei (2018) ) and non-stationary time series analysis (Dahlhaus (2012) ) have undergone unprecedented developments over the last two decades. This paper focuses on the following evolutionary linear factor model for a multivariate non-stationary time series: x i,n = A(i/n)z i,n + e i,n , (1.1) where {x i,n } n i=1 is a p-dimensional observed time series, A(t): [0, 1] → R p×d(t) is a matrix-valued function of possibly time-varying factor loadings and the number of factors d(t) is assumed to be a piecewise constant function of time, {z i,n } n i=1 is a d(i/n)-dimensional unobserved sequence of common factors and {e i,n } n i=1 are the idiosyncratic components. Here p = p n may diverge to infinity with the time series length n and d(t) is typically much smaller than p uniformly over t. Note that x i,n , z i,n and e i,n are allowed to be locally-stationary processes for which the generating mechanism varies with time, see (4.4) for detailed formulation. Throughout the article we assume that {e i,n } and {z i,n } are centered. The version of model (1.1) with constant loadings is among the most popular dimension reduction tools for the analysis of multivariate stationary time series (Stock and Watson (2011) , Tsay (2013 ), Wei (2018 ). According to the model assumptions adapted and estimation methods used, it seems that recent literature on linear time series factor models mainly falls into two types. The celebrated cross-sectional averaging method (summarized in Stock and Watson (2011) ) which is popular in the econometric literature of linear factor models, exploits the assumption of weak dependence among the vector components of e i,n and hence achieves de-noising via cross-sectional averaging. See for instance Stock and Watson (2002) , Bai and Ng (2002) , Bai (2003) and Forni et al. (2000) among many others. One advantage of the cross-sectional averaging method is that it allows for a very high dimensionality. In general the method requires that p diverges to achieve consistency and the estimation accuracy improves as p gets larger under the corresponding model assumptions. On the other hand, the linear factor model can also be fitted by exploring the relationship between the factor loading space and the auto-covariance or the spectral density matrices of the time series under appropriate assumptions. This method dates back at least to the works of Anderson (1963) , Brillinger (2001) and Pena and Box (1987) among others for fixed dimensional multivariate time series and is extended to the high dimensional setting by the recent works of Lam et al. (2011) , Lam and Yao (2012) , Wang et al. (2019) and others. The latter method allows for stronger contemporary dependence among the vector components and is consistent when p is fixed under the requirement that the idiosyncratic components form a white noise. To date, the literature on evolutionary linear factor models is scarce and the existing results are focused on extensions of the cross-sectional averaging method. Among others, Breitung and Eickmeier (2011) and Yamamoto and Tanaka (2015) considered testing for structural changes in the factor loadings and Motta et al. (2011) and Su and Wang (2017) considered evolutionary model (1.1) using the cross-sectional averaging method. Eichler et al. (2011) and Barigozzi et al. (2021) studied non-stationary dynamic factor models. In this paper, we shall extend the second estimation method mentioned in the last paragraph to the case of evolutionary factor loadings with locally stationary factor and idiosyncratic component time series whose data generating mechanisms change smoothly over time. Using this framework, our approach to the factor model estimation contributes to the literature mainly in the following two aspects. (a) To estimate the time-varying loading matrix, the prevailing approach in the literature is the localconstant kernel estimator, see for example Motta et al. (2011) , Su and Wang (2017) . It seems that it is difficult to extend the local-constant method to general local polynomial methods for factor models under the cross-sectional averaging set-up and therefore the estimation accuracy of the existing methods is not adaptive to the smoothness (with respect to time) of the factor loading matrix function. In this paper, we propose an alternative adaptive estimation method based on the method of sieves (Grenander (1981) , Chen (2007) ). The sieve method is computationally simple to implement and has the advantage of being adaptive to the unknown smoothness of the target function if certain linear sieves such as the Fourier basis (for periodic functions), the Legendre polynomials or the orthogonal wavelets are used (Chen (2007) , Wang and Xiang (2012) ). Specifically, we adapt the method of sieves to estimate the high-dimensional auto-covariance matrices of x i,n at each time point and subsequently estimate the span of the loadings A(t) at each t exploiting the relationship between A(·) and the kernel of the latter local auto-covariance matrices. We will show that the span of A(·) can be estimated at a rate independent of p uniformly over time provided that all factors are strong with order p 1/2 Euclidean norms, extending the corresponding result for factor models with static loadings established in Lam et al. (2011) . (b) In most literature for time-varying factor models such as Motta et al. (2011) , Su and Wang (2017) , to estimate the time-varying loading matrix, it is assumed that the number of factors is constant over time. Typically further requirements on the factor process such as independence or time-invariance of its covariance matrix had to be added. In this paper, we model the factor process as a general locally stationary time series and allow the number of factors to be time-varying. Uniform consistency of the estimated span of the loading matrix as well as the number of factors will be established without assuming that the positive eigenvalues of the corresponding matrices are distinct which is commonly posited in the literature of factor models. . Testing whether A(·) is constant over time is important in the application of (1.1). In the literature, among others Breitung and Eickmeier (2011) proposed LR, LM and Wald statistics for testing static factor model against an alternative of piece-wise constant loadings, and Yamamoto and Tanaka (2015) improved the power of Breitung and Eickmeier (2011) by maximizing the test statistic over possible numbers of the original factors. Assuming piece-wise stationarity, Barigozzi et al. (2018) estimated the change points of a factor model via wavelet transformations. Su and Wang (2017) considered an L 2 test of static factor loadings under the cross-sectional averaging method assuming that each component of e i,n is a martingale difference sequence. Here we shall propose a high-dimensional L ∞ or maximum deviation test on the time-invariance of the span of A(·) which utilizes the observation that the kernel of the fullsample auto-covariance matrices coincides with all of its local counterparts under the null hypothesis of static span of loadings while the latter observation is likely to fail when the span of A(·) is time-varying. Using the uniform convergence rates of the estimated factor loadings established in this paper, the test statistic will be shown to be asymptotically equivalent to the maximum deviation of the sum of a highdimensional locally stationary time series under some mild conditions. A multiplier bootstrap procedure with overlapping blocks is adapted to approximate the critical values of the test. The bootstrap will be shown to be asymptotically correct under the null and powerful under a large class of local alternatives. The assumptions of our testing procedure are different from those of the existing literature in the following two aspects. (c) Under the null hypothesis of constant A(·), the common components of the time series, i.e. A(i/n)z i,n , considered in the above-mentioned works are stationary or have time-invariant variance-covariance. With locally stationary z i,n assumed in this paper, under the null hypothesis, the common components are allowed to be locally stationary where their variance-covariance matrices can be smoothly time-varying. (d) The validity of the tests of the above works is built on the divergence of both the length of time series n and the dimension of the time series p. In contrast, our proposed test is proved to be asymptotically correct when p is fixed or diverging slowly with n. Therefore, our test is suitable to investigate the loading matrix of fixed or moderately high-dimensional vector time series such as the COVID 19 death data investigated in Section 11.2, which records the daily COVID 19 deaths for n = 591 days in 57 places of the U.S. and three countries in the compacts of Free Association with the U.S (hence p = 60). The paper is organized as follows. Section 2 introduces some notation. Section 3 explains the locally stationarity of the factor model (1.1). The assumptions of the model are investigated in Section 4. Sections 5 and 6 discuss the estimation of the evolutionary factor loading matrices and the test of static factor loadings, respectively, and the corresponding theoretical results are contained in Section 7. Section 8 investigates the power performance of the test of the static factor loading. Section 9 gives out methods for tuning parameter selection. Numerical studies are displayed in Section 10. Section 11 illustrates a real world financial data application and a COVID 19 data analysis. Section 12 provides the proofs of Theorem 7.1, Theorem 7.2, Theorem 7.3, Proposition 7.3 and Theorem 7.4. The proofs of the remaining theorems, propositions, lemmas and corollaries are relegated to the online supplemental material. For two series a n and b n , write a n b n if the exists 0 < m 1 < m 2 < ∞ such that m 1 ≤ lim inf |an| |bn| ≤ lim sup |an| |bn| ≤ m 2 < ∞. Write a n b n (a n b n ) if there exists a uniform constant M such that For any matrix F let λ max (F) be its largest eigenvalue, and λ min (F) be its smallest eigenvalue. Let F F = (trace(F F)) 1/2 denote the Frobenius norm, and F m be the smallest nonzero singular value of F. Denote by vec(F) the vector obtained by stacking the columns of F. Let F 2 = λ max (FF ). In particular, if F is a vector, then F 2 = F F . For any vector or matrix A = (a ij ) let |A| ∞ = max i,j |a ij |. For any integer v denote by I v the v × v identity matrix. Write |I| be the length of the interval I. 3 Locally stationarity of Model (1.1) We allow the number of factors and the dimension of loading matrix of model (1.1) is time-varying. Since the number and the dimension are integers, it is sophisticated to define the 'smoothly changing' factor number or 'smoothly changing' dimensions directly, where the concept of 'smoothly changing' is the key assumption of locally stationary models and of nonparametric smoothing approaches. Moreover, in current literature many assumptions including stationarity and dependence strength, are not directly applicable to time series with possibly changing dimensions such as z i,n in model (1.1). On the other hand, the dimension of x i,n is time invariant, which is convenient to be assumed locally stationary. In the following we show that the evolutionary factor model (1.1) could be derived from a factor model with a fixed factor dimension d while the rank of the loading matrix varies with time. The latter representation is beneficial for the theoretical investigation of the model and provides insight on how smooth changes in the loading matrix cause the dimensionality of the factor space to change. parameter δ refers to the factor strength and will be discussed in condition (C1) later. Let z * i,n and e i,n be d and p dimensional locally stationary time series. Here local stationarity refers to a slowly or smoothly time-varying data generating mechanism of a time series, which we refer to Dahlhaus (1997) and Zhou and Wu (2009) among others for rigorous treatments. Consider the following model with a fixed dimensional loading matrix (3.1) Using singular value decomposition (SVD), model (3.1) can be written as be the number of nonzero singular values σ u (t) and equation (3.2) can be further written as whereΣ(i/n) is the matrix by deleting the rows and columns of Σ(i/n) where the diagonals are zero, andŨ(i/n) andW (i/n) are the matrices resulted from the deletion of corresponding columns and rows of U(i/n) and W (i/n), respectively. Let A(i/n) = p 1−δ 2Ũ (i/n)Σ(i/n) and z i,n =W (i/n)z * i,n , then A(i/n) is a p × d(i/n) matrix and z i,n is a length d(i/n) locally stationary vector. Notice that z i,n := W (i/n)z * i,n in (3.2) is the collection of all factors in the sense that all the entries of z i,n are elements ofz i,n for each i. The fact that model (1.1) can be obtained from model (3.3) has the following implications. First, though we allow d(t) to jump from one integer to another over time, it is appropriate to assume x i,n of (1.1) is locally stationary since model (3.1) is locally stationary as long as z * i,n and e i,n are locally stationary processes and each element of A * (t) is Lipschitz continuous. Notice that the dimensions of z * i,n and A * (t) are time-invariant. Second, we observe from the SVD that A * (i/n) m = A(i/n) m will be close to zero in a small neighborhood around the time when d(t) of model (1.1) changes in which case it is difficult to estimate the rank of A * (·) or the number of factors accurately. We will exclude such small neighborhoods in our asymptotic investigations. The varying d(t) has been considered in the piecewsie stationary factor models and factor models with piecewsie constant loading matrix, where the jump points of d(t) correspond to the change points of the factor models, see for instance Barigozzi et al. (2018) and Ma and Su (2018) among others. Letz i,n be the vector consisting of all factors which appear during the considered period, and the components of z i,n are always elements ofz i,n . Adapting the formulation in Zhou and Wu (2009) , we model the locally stationary time series x i,n ,z i,n and e i,n , 1 ≤ i ≤ n as follows: The dependence measures for x i,n ,z i,n (or z i,n equivalently) and e i,n in L l norm are defined as (Zhou and Wu (2009) which quantify the magnitude of change of systems G, Q, H in L l norm when the inputs of the systems k steps ahead are replaced by their i.i.d. copies. To formally state the requirements of model (1.1), we define further some notation. At time i/n, the d(i/n) dimensional factors z i,n = (z i,l 1 ,n , ..., z i,l d(i/n) ,n ) where and Σ x (t, k) = E(G(t, F i+k )G (t, F i )). By Section 3 we assume naturally x i,n is locally stationary which implies that Σ x (t, k) is smooth with respect to t. We assume the following conditions for model (1.1). Condition (A1) concerns the boundedness of the loading matrix, while (A2) means Σ x (t, k) can be approximated by the basis expansion. The approximation error rate g Jn,K,M diminishes as J n increases. Often higher differentiability yields more accurate approximation rate. We present condition (C1) on factor strength ( c.f. Section 2.3 of Lam et al. (2011) ) as follows. (C1) Write A(t) = (a 1 (t), ...., a d(t) (t)) where a s (t), 1 ≤ s ≤ d(t) are p dimensional vectors. Then sup t∈[0,1] a s (t) 2 2 p 1−δ for 1 ≤ s ≤ d(t) for some constant δ ∈ [0, 1]. Besides, the matrix norm of The assumptions for z i,n and e i,n are now in order. (M1) The short-range dependence conditions hold for both z i,n and e i,n in L l norm , i.e. (M3) For t, s ∈ [0, 1], there exists a constant M such that Conditions (M1)-(M3) mean that each coordinate process ofz i,n and e i,n are standard short memory locally stationary time series defined in the literature; see for instance Zhou and Wu (2009) . Furthermore, the moment condition (M2) implies that for 0 ≤ t ≤ 1, all elements of the matrices Σ ze (t, k) and Σ e (tk) are bounded in L 4 norm. We then postulate the following assumptions on the covariance matrices of the common factors z i,n and the idiosyncratic components e i,n , which are needed for spectral decomposition: (S1) For t ∈ [0, 1] and k = 1, ..., k 0 , all components of Σ e (t, k) are 0. (S2) For k = 0, 1, ..., k 0 , Σ z (t, k) is full ranked on some sub-interval I 0 of [0, 1]. (S3) For t ∈ [0, 1] and k = 1, ..., k 0 , all components of Σ ez (t, k) are 0. Condition (S1) indicates that (e i,n ) does not have auto-covariance up to order k 0 which is slightly weaker than the requirement that (e i,n ) is a white noise process used in the literature. Condition (S2) implies that for 1 ≤ i ≤ n, there exists a certain period that no linear combination of components of z i,n is white noise that can be absorbed into e i,n . (S3) implies that z i,n and e i+k,n are uncorrelated for any k ≥ 0. Condition (S4) requires a weak correlation between z i+k,n and e i,n . In fact it is the non-stationary extension of Condition (i) in Theorem 1 of Lam et al. (2011) and condition (C6) of Lam and Yao (2012) . Though (C6) of Lam and Yao (2012) assumes a rate of o(p 1−δ ), it requires standardization of the factor loading matrix A. If d(t) is piecewise constant with bounded number of change points, then |T ηn | → 1 as n → ∞ and η n → 0. If d(t) ≡ d we can assume that T ηn = [0, 1] for some sufficiently small positive η n := η > 0. Remark 4.1. We now discuss the equivalent assumptions on the fix dimensional loading matrix model and assume (S) for these quantities. By construction conditions (M) are satisfied. In particular, for model and Σ z * e * (t, k), k = 0, ..., k 0 has a bounded K order derivative. Condition (C1) is satisfied with T ηn corresponding to the period when the smallest nonzero eigenvalue of Σ(·) exceeds η n . Observe from equation (1.1) and conditions (S1)-(S4) that for some pre-specified integer k 0 and we have Therefore in principle A(t) can be identified by the null space of Λ(t). The use of Λ(t) was advocated in Lam et al. (2011) , and in this paper we aim at estimating a set of time-varying orthonormal basis of this time-varying null space, which is identifiable up to rotation, to characterize A(t). Therefore we don't impose the condition utilized by Lam et al. (2011) that the loading matrix is normalized for identification. As we discussed in the introduction, fitting factor models using relationships between the factor space and the null space of the auto-covariance matrices has a long history. In the following we shall propose a nonparametric sieve-based method for time-varying loading matrix estimation which is adaptive to the smoothness (with respect to t) of the covariance function Σ x (t, k). For a pre-selected set of orthonormal basis functions {B j (t)} ∞ j=1 satisfying the basis approximation condition (A2), we shall approximate Σ x (t, k) by a finite but diverging order basis expansion where the order J n diverges to infinity. The speed of divergence is determined by the smoothness of Σ x (t, k) with respect to t. Motivated by (5.3) we propose to estimate Λ(t) by the followingΛ(t): wherev i (t) s are the eigenvectors ofΛ(t) corresponding to λ 1 (Λ(t)),...,λ d(t) (Λ(t)). Then we estimate the column space of A(t) by To implement this approach it is required to determine d(t), which we estimate using a modified version of the eigen-ratio statistics advocated by Lam and Yao (2012) ; that is, we estimatê d n (t) = argmin where R(t) is the largest integer less or equal to p/2 such that ≥ C 0 η n / log n (5.8) for some positive constant C 0 . Asymptotic theory established in Section 7 guarantees that the percentage of variance explained by each of the first d(t) eigenvalues will be much larger than η 2 n / log 2 n on T ηn with high probability which motivates us to restrict the upper-bound of the search range, R(t), using (5.8). It is of practical interest to test H 0 : span(A(t)) = span(A), where A is a p × d matrix. In other words, one can find a time-invariant matrix A to represent the factor loading matrices throughout time. Without loss of generality, we shall assume that A(t) = A under the null hypothesis throughout the rest of the paper if no confusions will arise. Under the null hypothesis, Σ x (t, k) = AΣ z (t, k)A + AΣ ze (t, k) for k = 0. Observe that testing H 0 is more subtle than testing covariance stationarity of x i,n as both z i,n and e i,n can be non-stationary under the null. By equation (1.1), we have, under H 0 , Consider the following quantity Γ k and its estimateΓ k : x i+k x i,n /n) . Then the kernel space of A can be estimated by the kernel ofΓ under H 0 . Letf i and f i , i = 1, ..., p−d be the orthonormal eigenvectors ofΓ and Γ w.r.t. (λ d+1 (Γ),...,λ p (Γ)) and (λ d+1 (Γ),...,λ p (Γ)), respectively. Write The test is constructed by segmenting the time series into non-overlapping equal-sized blocks of size m n . Without loss of generality, consider n − k 0 = m n N n for integers m n and N n . Define for 1 ≤ j ≤ N n the index set b h = ((h − 1)m n + 1, ..., hm n ) and the statistiĉ (6.1) Then we define the test statisticT n whered n is an estimate of d. The testT n utilizes the observation that the kernel space of the full sample statisticΓ coincides with that of Σ x (t, k) for each t under the null while the coincidence is likely to fail under the alternative. Hence the multiplication off i , 1 ≤ i ≤ p − d andΣ x h,k should be small in L ∞ norm uniformly in h under the null while some of those multiplications are likely to be large under the alternative. We then propose a multiplier bootstrap procedure with overlapping blocks to determine the critical values ofT n . Define for 1 ≤ i ≤ m n , 1 ≤ h ≤ N n , the (p − d) × p vector For given m n , letŝ j,wn = j+wn−1 r=jl r andŝ mn = mn r=1l r for 1 ≤ i ≤ m n where w n = o(m n ) and w n → ∞ is the window size. Define Then we have the following algorithm for testing static factor loadings: Algorithm for implementing the multiplier bootstrap: (1) Select m n and w n by the Minimal Volatility (MV) method that will be described in Section 9.2. (2) Generate B (say 1000) conditionally i.i.d. copies of K r = |κ i } i∈Z . (3) Let K (r) , 1 ≤ r ≤ B be the order statistics for K r , 1 ≤ r ≤ B. Then we reject H 0 at level α if . Let B * = min{r : K (r) ≥T n } and the corresponding p value of the test can be approximated by 1 − B * /B. To implement our test, d will be estimate bŷ for some constant R satisfying (5.8) withΛ(t) there replaced byΓ. 7 Asymptotic Results Theorem 7.1 provides the estimation accuracy ofΛ(t) by the sieve method. Theorem 7.1. Assume conditions (A1), (A2), (C1), (M1), (M2) ,(M3) and (S1)-(S4) hold. Define From the proof, we shall see that Λ(t) F is of the order p 2−2δ uniformly for t ∈ [0, 1]. Hence if we further assume that p δ ν n = o(1), then the approximation error ofΛ(t) is negligible compared with the magnitude of Λ(t). For orthnormal Legendre polynomials and trigonometric polynomials it is easy to derive that Lip j = O(j 2 ). Similar calculations can be performed for a large class of frequently-used basis functions. The first term of ν n is due to the stochastic variation ofM(J n , t, k), while the second and last terms are due to the basis approximation. for some positive integer m. We are able to specify the rate g Jn,m+1,M for various basis functions. (a) Assume max 1≤i,j≤p,1≤k≤k 0 x,i,j (t) is of bounded variation for all i, j, then the uniform approximation rate g Jn,m+1,M is J −v+1/2 n for normalized Legendre polynomial basis. If σ x,i,j (t, k) s are analytic inside and on the Bernstein ellipse then the approximation rate decays geometrically in J n . See for instance Wang and Xiang (2012) for more details. can be extended to periodic functions) or orthogonal wavelets generated by sufficiently high order father and mother wavelets are used. See Chen (2007) for more details. Theorem 7.2. Under conditions of Theorem 7.1, we have Assertion (i) follows from Theorem 7.1 and a variant of Davis Kahan Theorem (Yu et al. (2015)) which does not require the separation of all non-zero eigenvalues. (i) involves orthogonal matricesÔ 1 (t) andÔ 2 (t) since it allows multiple eigenvalues at certain time points, which yields the non-uniqueness of the eigen-decomposition. Moreover, when all the factors are strong (δ = 0) the rate in (i) is independent of p and reduces to the uniform nonparametric sieve estimation rate for univariate smooth functions, which coincides with the well-known 'blessing of dimension' phenomenon for stationary factor models, see for example Lam et al. (2011) . Further assuming p δ ν n = o(η n ), Theorem 7.2 guarantees the uniform consistency of the eigen-space estimator on T ηn . Through studying the estimated eigenvectors, Theorem 7.1 and Theorem 7.2 demonstrate the validity of the estimator (5.6). The next thoerem discusses the property of estimated eigenvalues λ i (Λ(t)), i = 1, ...p. Theorem 7.3. Assume ν n p δ = o(1) and that there exists a sequence g n → ∞ such that ηn (gnνnp δ ) 1/2 → ∞. Then under conditions of Theorem 7.2, we have that Notice that term p 2δ g 2 n ν 2 n /η 3 n in (iv) is asymptotically negligible compared with the term η n in (iii). If d(t) has a bounded number of changes, (iii) and (iv) of Theorem 7.3 indicate that the eigen-ratio estimator (5.7) is able to consistently identify the time-varying number of factors d(t) on intervals with total length approaching 1. We remark that most results on linear factor models require that the positive eigenvalues are distinct. Theorems 7.2 and 7.3 are sufficiently flexible to allow multiple eigenvalues of Λ(t). Theorem 7.3 yields the following consistency result for the eigen-ratio estimator (5.7). Remark 7.2. If we assume that σ x,i,j (t, k) s are real analytic and normalized Legendre polynomials or trigonometric polynomials (when all σ x,i,j (t, k) can be extended to periodic functions) are used as basis, we shall take J n = M log n for some large constant M . Then Theorem 7.1 will yield an approximation rate . Consider η n = log −c n for some fixed constant c > 0. For Theorem 7.2 the approximation rate of (i) is then p δ log 1+c n √ n and of (ii) is p −1/2 + p δ/2 log 1+c n √ n . Our rate achieves the rate of Lam et al. (2011) for stationary high dimensional time series except a factor of logarithm. For Theorem 7.3 the approximation rate is p 2−δ log n √ n for (i), and the lower-bounds are arbitrarily close to p 2 log 2+2c n n for (ii) and p 2δ log 2+3c n n for (iv). These rates coincide that of Theorem 1 and Corollary 1 of Lam and Yao (2012) except a factor of logarithm. The above findings are consistent with the results in nonparametric sieve estimation for analytic functions where the uniform convergence rates for the sieve estimators (when certain adaptive sieve bases are used for the estimation) are only inferior than the n −1/2 parametric rate by a factor of logarithm. We shall also point out that the sieve approximation rates will be adaptive to the smoothness and will be slower when σ x,i,j (t, k) s are less smooth in which case g Jn,K,M converges to zero at an adaptive but slower rate as J n increases. Proposition 7.1. Assume ν n p δ = o(1). Under the conditions of Theorem 7.2, we have Remark 7.3. In practice, if the estimated number of factorsd(t) does not change over time, then according to our discussions in Section 2, T ηn = [0, 1]. Otherwise one can set wheret s , s = 1, ...r are the time points whend(t) changes. In fact, by condition (C1), equation (7.2) corresponds to choosing η n 1 log 4 n when the eigenvalues of A(t)/p (1−δ)/2 are Lipschitz continuous. We discuss the limiting behaviour ofT n of (6.2) under H 0 in this subsection. First, the following proposition indicates that with asymptotic probability oned n equals d under H 0 . Proposition 7.2. Assume conditions (A1), (A2), (C1), (M1), (M2), (M3) and conditions (S1)-(S4) hold, and that p δ log n Next we define a quantityT n using the true quantities F and d to approximateT n , Recall the definition ofF and F in Section 6. We shall see from Proposition 7.3 below thatT −T is negligible asymptotically under some mild conditions due to the small magnitude of F − F F . Then there exists a set of orthnormal basis F of the null space of Γ such that for any sequence g n → ∞ where When l is sufficiently large, the order of the rate in (7.4) is close to p δ+1/2 √ mn √ n . Hence, in order for the error in (7.4) to vanish, p can be as large as O(n a ) for any a < 1 when δ = 0. Furthermore, under the nullT is equivalent to the L ∞ norm of the averages of a high-dimensional time series. Specifically, definel i by replacingF with F in the definition ofl i (c.f. (6.4)) with its j th element denoted byl i,j . Then straightforward calculations indicate thatT = | mn i=1l i √ mn | ∞ . Therefore we can approximateT by the L ∞ norm of a certain mean zero Gaussian process via the recent development in high dimensional Gaussian approximation theory, see for instance Chernozhukov et al. (2013) and Zhang and Cheng (2018) . Let ) be a centered k 0 N n (p−d)p dimensional Gaussian random vectors that preserved the auto-covariance structure ofl i for 1 ≤ i ≤ m n and write y = mn i=1 y i / √ m n . Theorem 7.4. Assume conditions of Proposition 7.3 hold, p δ √ mn log n √ n Ω n = o(1) and assume the following (7.5) (f)There exists a constant M 2l depending on l such that for 1 ≤ i ≤ n, and for all p dimensional vector c such that c 2 = 1, the inequality c e i,n L 2l ≤ M 2l c e i,n L 2 holds. Also max 1≤i≤n λ max (E(e i,n e i,n )) is uniformly bounded as n and p diverges. Then under null hypothesis, we have where Θ M,j,q = ∞ k=M δ G,q,j (k) and Ξ M = max 1≤j≤p ∞ k=M kδ G,2,j (k), and the minimum is taken over all possible values of γ and M subject to Conditions (a)-(e) control the magnitude of the loading matrix, the dimension and dependence of the time series x i,n as well as non-degeneracy of the processl i . Those conditions are standard in the literature of high dimensional Gaussian approximation; see for instance Zhang and Cheng (2018) . Notice that if the . Condition (f) controls the magnitude of the L 2l norm of projections of e i,n by their L 2 norm which essentially requires that the dependence among the components of e i,n cannot be too strong. (f) is mild in general and is satisfied, for instance, if the components of e i,n are independent or e i,n is a sub-Gaussian random vector with a variance proxy that does not depend on p for each i. Finally, we comment that Condition (d) is a mild non-degeneracy condition. For example, assuming (i) e s,n is independent of e l,n , z l,n for s ≥ l, min 1≤i≤n,1≤j≤p EX 2 i,j ≥ η > 0, min 1≤i≤n λ min (E(e i,n e i,n )) ≥ η > 0 for some constants η and η , and (ii) conditions (a)-(c), (e) and (f) hold, then condition (d) holds. To see this, note that under null hypothesis, Consequently (7.9) and (7.10) imply condition (d). The validity of the bootstrap procedure is supported by the following theorem: Theorem 7.5. Denote by W n,p = (k 0 N n (p − d)p) 2 . Assume that the conditions of Theorem 7.4 hold, w n → ∞, w n /m n = o(1), and that there exist q * ≥ l and > 0 such that Θ n := w −1 n + w n /m n W 1/q * n,p W − n,p and (i) max 1≤i≤n max 1≤j≤p x i,j,n L 2q * ≤ M for some sufficiently large constant M . (ii) Condition (e) of Theorem 7.4 holds with l replaced by q * . Then we have that conditional on x i,n and under H 0 , The convergence rate of (7.11) will go to zero and hence the bootstrap is asymptotically consistent (1); see the discussion below (7.8) in Section 7.2. In this section we discuss the local power of our bootstrap-assisted testing algorithm in Section 7.3 for testing static factor loadings. For two matrices A and B, let A • B be the Hadamard product of A and B. Let J be a p × d matrix with all entries 1 where d = rank(A). To simplify the proof, we further assume the following Condition (G): (G1): e i,n is independent of z s,n , e s,n for i > s. for all p dimensional vector f such that f F = 1. Conditions (G1) and (G2) are mild assumptions on the error and factor processes. Condition (G3) controls the strength of temporal dependence for projections of the error process (e i,n ). (G3) essentially requires that each component of e i,n is a short memory process and the dependence among the components of e i,n is not too strong. Elementary calculation shows that Condition (G3) is satisfied under two important scenarios. The first is that the dependence measures satisfy max 1≤j≤p δ H,4,j (k) = O(χ k ) for some χ ∈ (0, 1); see Lemma 8.1 below. The second is that the components of process (e i,n ) are independent of each other, i.e., H j 1 (·, F u ) is independent of H j 2 (·, F v ) for all u, v ∈ N as long as j 1 = j 2 , 1 ≤ j 1 , j 2 ≤ p, and We consider the following class of local alternatives: 2 ) for some δ 1 ∈ [δ, 1] and ρ n ↓ 0 controls the magnitude of deviation from the null. Notice that, under the alternative hypothesis, A n (t) is a function of ρ n and therefore we write Let Γ k (ρ n ) = γ(ρ n , k)γ(ρ n , k) and Γ(ρ n ) = k 0 k=1 Γ k (ρ n ). To simplify notation, write Γ(ρ n ), ρ n > 0 as Γ n with orthogonal bases (f 1,n , ..., f p−d,n ) := F n . Recall the centered k 0 N n (p − d)p dimensional Gaussian random vectors y and y i , and the definition ofl i in Section 7.2. Defineľ i in analogue ofl i with F replaced by F n . Therefore by (a) 9 Selection of Tuning Parameters 9.1 Selection of J n for the estimation of time-varying factor loading matrices. We discuss the selection of J n for the estimation of time-varying factors. Since in practice g Jn,K,M is unknown, a data-driven method to select J n is desired. By Theorem 7.2, the residualsê i,n = x i,n − V( i n )V ( i n )x i,n , andê i,n = (ê i,1,n , ...,ê i,p,n ) is a p dimensional vector. We select J n as the minimizer of the following cross validation standard CV (J), where v i,s (J) is the s th diagonal element ofV( i n )V ( i n ) obtained by setting J n = J, andê i,s,n (J), 1 ≤ i ≤ n, 1 ≤ s ≤ p are also the residuals calculated when J n = J. The cross-validation has been widely used in the literature of sieve nonparametric estimation and has been advocated by for example Hansen (2014) . We choose J n = argmin J (CV (J)) in our empirical studies. 9.2 Selection of tuning parameters m n and w n for testing static factor loadings We select m n by first select N n and let m n = (n − k 0 )/N n . The N n is chosen by the minimal volatility method as follows. For a given data set, let be the test statistic obtained by using the integer parameter N n withd n = argmin 1≤i≤R λ i+1 (Γ)/λ i (Γ) where R is defined in (6.6). Consider a set of possible values for N n , which is denoted by {J 1 , ..., J s } where J s are positive integers. For each J v , 1 ≤ v ≤ s we calculateT (J v ) and hence the local standard error which stabilizes the test statistics. The idea behind the minimum volatility method is that the test statistic should behave stably as a function of N n when the latter parameter is in an appropriate range. In our empirical studies we find that the proposed method performs reasonably well, and the results are not sensitive to the choice of N n as long as N n used is not very different from that chosen by (9.4). After choosing N n and hence m n , we then further choose w n again by the minimal volatility method. In this case we first obtain the k 0 N n (p − d)p dimensional vectors {l i , 1 ≤ i ≤ m n } defined in Section 6. Then we select w n by a multivariate extension of the minimal volatility method in Zhou (2013) as follows. We consider choosing w n from a grid w 1 ≤ ... ≤ w r . For each w n = w p(m n − w r + 1) dimensional vector, and B be a k 0 N n (p − d)p(m n − w r + 1) × r matrix with its i th column B o i . Then for each row, say i th row B i,· of B, we calculate the local standard error SE(B i,· , h) for a given window size h, see (9.3) for definition of SE and therefore obtain a r − 2h length row vector (SE(B i,h+1 , h) , ...SE(B i,r−h , h)). be a (r − 2h) length vector with its i th element being the maximum entry of the i th column of B † . Then we choose w n = w k+h if the smallest entry of colmax(B † ) is its k th element. 10 Simulation Studies 10.1 Estimating the time-varying factor models In this subsection we shall examine the performance of our proposed estimator (5.6) for time-varying factor models, and compare it with that in Lam et al. (2011) . The latter is equivalent to fixing J n = 0 in (5.3). We use normalized Legendre polynomials as our basis throughout our empirical studies. The method studied in Lam et al. (2011) is developed under the assumption of stationarity with static factor loadings and hence the purpose of our simulation is to illustrate that the methodology developed under stationarity does not directly carry over to the non-stationary setting. To demonstrate the advantage of the adaptive sieve method, our method is also compared with a simple local estimator of Λ(t), which was considered in the data analysis section in Lam et al. (2011) and we shall call it the local PCA method in our paper. Specifically, for each i, Λ( i n ) will be consistently estimated bŷ where m is the window size such that m → ∞ and m = o(n). The J n of our method is selected by cross validation while m of the local PCA method is selected by the minimal volatility method. Define the following smooth functions: Let A = (a 1 , .., a p ) be a p × 1 matrix with a i = 1 + 0.2(i/p) 0.5 . Define the locally stationary process .., i ) and ( i ) i∈Z is a sequence of i.i.d. N (0, 1) random variables. We then define the time varying matrix where A 1 , A 2 , A 3 and A 4 are the sub-matrices of A which consist of the first round(p/5) th rows, the (round(p/5)+1) th to round(2p/5) th , (round(2p/5)+1) th to (round(3p/5)) th and the (round(3p/5)+1) th to p th rows of A, respectively. Let e i,n = (e i,1 , ..., e i,p ) be a p × 1 vector with (e i,j ) i∈Z,1≤j≤p i.i.d. N (0, 0.5 2 ), and (e i,j ) i∈Z,1≤j≤p are independent of ( i ) i∈Z . We consider the cases that p = 50, 100, 200, 500 and n = 1000, 1500. The performances of the methods are measured in terms of the Root-Mean-Square Error (RMSE) and the average principal angle. The The principle angle between A(i/n) and its estimateÂ(i/n) is defined as Υ i,n := i,n A i,n , where A i,n and i,n are the normalized A(i/n) andÂ(i/n) respectively. The principle angle is a well-defined distance between spaces span(A(i/n)) and span(Â(i/n)). Finally, the average principle angle is defined asῩ = 1 n n i=1 Υ i . We present the RMSE and the average principle angle of the three estimators using 100 simulation samples in Table 10 .1 and 10.2, respectively. Our method achieves the minimal RMSE and average principle angle in all simulation scenarios among the three estimators. We choose k 0 = 3 in our simulation. Other choices k 0 = 1, 2, 4 yield the similar results are not reported here. As predicted by Theorem 7.2 when δ = 0, RMSE in Table 10 .1 decreases as n, p increases and the average principle angle decreases with n increases, and is independent of p. We now examine our testing procedure in Section 6 to test the hypothesis of static factor loadings via B = 1000 bootstrap samples. Define g 1 (t) = 0.1 + 0.06t 2 , g 2 (t) = 0.12 + 0.04t 2 , g 3 (t) ≡ 0.15, (10.7) Let A be a p × 3 matrix with each element generated from 2U (−1, 1), and where A 1 , A 2 and A 3 are the sub-matrices of A which consist of the first round(p/3) th rows, the (round(p/3) + 1) th to round(2p/3) th rows, and (round(2p/3) + 1) th to p th rows, respectively. The factors z i,n = (z i,1,n , z i,2,n , z i,3,n ) where z i,k,n = 2 ∞ j=0 g j k (i/n) i−j,k for k = 1, 2, 3, where { i,k } are i.i.d. standard normal. We consider the following different sub-models for e i,n = (e i,1,n , ..., e i,s,n ) : In this subsection, we examine the power performance of our testing procedure in Section 6 via B = 1000 bootstrap samples. Letà be a p × 1 matrix with each entry 1, and letà i , 1 ≤ i ≤ 5 be the sub-matrices ofà which consist of the first (round( (i−1)p 5 ) + 1) th to round( ip 5 ) th rows ofÃ, 1 ≤ i ≤ 5. We then consider a time-varying normalized loading matrix A(t) as follows. For a given D ∈ R, let (10.10) Let (e i,s,n ) i∈Z , 1 ≤ s ≤ p be i.i.d N (0, 0.5 2 ), and e i,n = (e i,s,n , 1 ≤ s ≤ p) . Let ( i ) i∈Z be i.i.d. N(0,1) that are independent of (e i,s,n ) i∈Z , 1 ≤ s ≤ p. Then the locally stationary process z i,n we consider in this subsection is z i,n = 4 ∞ j=0 g j (i/n) i−j , where the coefficient function g(t) = 0.2 + 0.2t. Observe that D = 0 corresponds to the situation when A(t) is time invariant, and as D increases we have larger deviations from the null. We examine the case of p = 25, 50, 100, 150, n = 1500 while increasing D from 0 to 0.5. The results are displayed in Table 10 .4 which supports that our test has a decent power. Each simulated rejection rate is calculated based on 100 simulated samples while for each sample the critical value is generated by block bootstrap algorithm using B = 1000 bootstrap samples in Section 6. ). Due to the well-documented unit root non-stationarity ofx t (see e.g. Fengler et al. (2007)) we study the 50-dimensional time series x t :=x t −x t−1 . In our data analysis we choose k 0 = 3. Using the minimal volatility method we select N n = 4 where N n = T −3 mn is the number of non-overlapping equal-sized blocks, and choose window size w n = 5. Via B = 10000 bootstrap samples we obtain a p − value of 6.08%. The small p − value provides a moderate to strong evidence against the null hypothesis of static factor loadings. We then apply our sieve estimator in Section 5 to estimate the time varying loading matrix. The cross validation method suggests the use of the Legendre polynomial basis up to 6 th order. We find that during the considered period the number of factors, or the rank of the loading matrix, is either one or two at each time point. In Figure 11 .1 we display the estimated number of factors at each time, and in Figure 11 We consider the daily state-level new deaths of COVID 19 from the United States. The data can be obtained from https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State. 26, 2020 since Feb. 27, 2020 is the first day when a death case appears in this dataset. The series considered ends on Oct.8, 2021. We denote this 60-dimensional vector of length 591 byỹ t , t = 1, 2, · · · , 591. We study the differenced death series y t =ỹ t −ỹ t−1 (11.11) by our non-stationary factor model. We apply our test in Section 6 to examine whether the loading matrix of y t is static. We choose k 0 = 3. By selecting the number of non-overlapping equal-sized blocks N n = 7 and window size w n = 24 via the minimal volatility method, through B = 10000 bootstrap samples we obtain a p − value of 0.073% which provides a strong evidence against the null hypothesis of static factor loadings. The sieve estimator in Section 5 is then applied to estimate the time varying loading matrix for y t , with the cross validation method suggesting the use of the Legendre polynomial basis up to 7 th order. During the considered period the number of factors varies from 1 to 5. For the state-level differenced new death of COVID 19 in USA, we depict the estimated number of factors in Figure 11 .3 and the percentage of trace of Λ(t) that is explained by the two leading eigenvalues in Figure 11 .4. The results evidence that the loading matrix, as well as the number of factors are time varying, and the leading two eigenvalues of Λ(t) could not dominate the sum of all eigenvalues of Λ(t) in certain period. The work of the first author was supported by NSFC Young program (No.11901337) . The work of the second author was supported by NSERC of Canada. The online supplement contains the proofs of Auxiliary lemmas for Theorem 7.1, 7.2, 7.3, Proposition 7.1, Proposition 7.2, Theorem 7.5, Lemma 8.1 and Theorem 8.1. We now present some additional notation for the proofs. For any random variable W = W (t, F i ), write .., g−1 , g , g+1 , ..., i ), and ( i ) i∈Z is an i.i.d copy of ( i ) i∈Z . is the index set of factors defined in Section 4. Proof of Theorem 7.1. Notice that for each k ∈ 1, ..., k 0 , we have that Notice that by Jansen's inequality, Cauchy-Schwartz inequality and conditions (C1), (M2), (S1)-(S4), it follows that uniformly for t ∈ [0, 1], Then the theorem follows from equations (S.1), (S.2) and (S.3). Proof of Theorem 7.2. We first prove (i). It suffices to show that the d(t) th largest eigenvalue of Λ(t) satisfies inf t∈Tη n λ d(t) (Λ(t)) η n p 2−2δ . Then the theorem follows from Theorem 7.1, (S.4) and Theorem 1 of Yu et al. (2015) . We now show (S.4). of the main article can be written as We now discuss the λ d(t) (Λ(t)). Write , 1) , ..., Σ Rz,e (t, k 0 )). (S.8) Therefore we havẽ where ⊗ denotes the Kronecker product. On the other hand, by (C1) we have Then by conditions (M1) where σ d(t) (Σ) of a matrix Σ denotes the d(t) th largest singular value of the matrix Σ. Therefore using (S.9) and the arguments in the proof of Theorem 1 of Lam et al. (2011) we have that inf t∈Tη n λ d(t) (Λ(t)) η n p 2−2δ . (S.10) This shows (S.4). Therefore the assertion (i) of the Theorem follows. The assertion (ii) follows from result (i), condition (C1) and a similar argument of proof of Theorem 3 of Lam et al. (2011) . Details are omitted for the sake of brevity. Proof of Theorem 7.3. (i) follows immediately from Lemma A.1 in the online supplement and Theorem 7.1. For (ii), notice that by Theorem 7.2 we have that where B(t),B(t) andÔ 2 (t) are defined in Theorem 7.2. SinceÔ 2 (t) is an (p−d(t))×(p−d(t)) orthonormal matrix, it is easy to check that the columns of B(t)Ô 2 (t) form an orthonormal basis of the null space of Λ(t). Using this fact, we then follow the decomposition (A.5) of Lam and Yao (2012) to obtain that for some sufficiently large constants M 1 and M 2 . For K 1,j (t), by (S.3) and (S.13), we have that Notice that for any two random variables X, Y with finite first moments, for any c 1 , c 2 > 0, the following inequality holds by Markov inequality Meanwhile η −1 n (pν n ) 2 = p 2−δ ν n (η −1 n p δ ν n ). Therefor by (S.17) and the results of Theorem 7.1, Theorem 7.2 and (S.14), we have that Finally, by (S.2) we shall see that Then by (i) whereM 0 is a sufficiently large constant such that η n p 2−2δ /M 0 < (1/2) inf t∈Tη n λ d(t) (Λ(t)). Therefore by (i) and Markov inequality (iii) holds. Finally, (iv) follows from (i), (ii), (S.21) and (S.17). By Proposition 7.2, it suffices to prove everything on the event A n := {d n = d}. Observe that 1 ≤ w ≤ k 0 , the (k 0 (p−d)ps 1 +(w−1)p(p−d)+(s 2 −1)p+s 3 ) th entry ofl i andl i are f s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 andf s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 respectively where x i,s 2 is the (s 2 ) th entry of x i,n . Then on event A n , |T n −T n | = max By the triangle inequality, condition (M ) and Cauchy inequality, it follows that max x s 1 mn+i+w,n x s 1 mn+i,s 3 | = max where f s 2 ,u andf s 2 ,u is the u th entry of f s 2 andf s 2 , respectively. Moreover, for 1 ≤ s 1 ≤ N n −1, 1 ≤ s 3 ≤ p, x s 1 mn+i+w,u x s 1 mn+i,s 3 ) 2 1/2 Using the fact that E max Proof of Theorem 7.4. We first show (7.6). To show (7.6) we verify the conditions of Corollary 2.2 of Zhang and Cheng (2018) . Recall the proof of Proposition 7.3 that for 1 ≤ s Notice that (S.28) satisfies (10) of Assumption 2.3 of Zhang and Cheng (2018) . Furthermore, by the triangle inequality, condition (M ) and Cauchy inequality, under null hypothesis, we have f s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 | L l ≤ f s 2 e s 1 mn+i+w L 2l x s 1 mn+i,s 3 L 2l . By triangle inequality and condition (M ), we have x s 1 mn+i,s 3 L 2l is bounded for all s 1 , i and s 3 . By where the in (S.30) is uniformly for all i, n, p, which is due to condition (f). This leads to that 1≤i≤mn E( max 1≤s 1 ≤Nn−1,1≤s 3 ≤p,1≤w≤k 0 ,1≤s 2 ≤p−d |f s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 | 4 ) (N n p 2 ) 4/l , (S.31) which with condition (c) shows that equation (12) of Corollary 2.2 of Zhang and Cheng (2018) holds. Finally, using (S.28), (S.31) and by similar arguments of Lemma A.3 in the online supplement and Lemma 5 of Zhou and Wu (2010) , we shall see that max 1≤j≤k 0 Nn(p−d)p σ j,j ≤ M < ∞ for some constant M . This verifies (9) of Zhang and Cheng (2018) , which proves the first claim of the theorem. To see the second claim of the theorem, writing δ 0 = g 2 n p δ √ m n √ n Ω n , where g n is a diverging sequence. Notice that Therefore following the first assertion of Theorem 7.4 and Proposition 7.3, we have Since |y| ∞ = max(y, −y), by Corollary 1 of Chernozhukov et al. (2015) , we have that Combining (S.34) and (S.35), and letting g n = ( p δ √ mn log n √ n Ω n ) −1/3 , the second claim of Theorem 7.4 follows. 1 Proofs of Proposition 7.1 and Auxiliary Lemmas for Theorem 7.1, 7.2, 7.3 . Proof of Proposition 7.1. Consider g n = η 2 n νnp δ log n . Then the conditions of Theorem 7.3 hold. Since p j=1 λ 2 j (Λ(t)) = Λ (t) 2 F , we shall discuss the magnitude of Λ (t) F . First, notice that by using similar and simpler argument of proofs of Theorem 7.2 it follows that inf t∈Tη n Λ(t) F p 2−2δ . By Theorem 7.1 and (S.19) in the main article we shall see that Define the events where as in the proof of Theorem 7.3 in the main article,M 0 is a sufficiently large constant such that . By the proof of Theorem 7.3, we shall see that Notice that on A n , by the choice of R(t), for some sufficiently small positive constant C 0 , where the last ≥ is due to (A.1). Because on A n , ≥ η n (c.f.(S.22) in the main article), and the fact that p 2δ g 2 n ν 2 n /η 3 n = o(min(η n , (log n) −1 ), we have that Therefore the proposition holds by plugging g n = η 2 n νnp δ log n into the above equation. The following lemma from ? is useful for proving Theorem 7.3. Lemma A.1. Let M (n) be the space of all n × n (complex) matrices. A norm · on M (n) is said to be unitary-invariant if A = U AV for any two unitary matrices U and V. We denote by Eig A the unordered n-tuple consisting of the eigenvalues of A, each counted as many times as its multiplicity. Let Proof. By definition we have that for 1 ≤ u ≤ p, Notice that here d = sup 0≤t≤1 d(t) is fixed. Therefore, assumptions (A1), (M2) and triangle inequality lead to the first statement of boundedness of fourth moment of (A.7). Finally (A1) and (M3) lead to the assertion (A.8). Lemma A.3. Consider the process z i,u,n z i+k,v,n for some k > 0, and 1 ≤ u, v ≤ d. Then under conditions (C1), (M1)-(M3) we have: i) ζ i,u,v =: ζ u,v ( i n , F i ) = z i,u,n z i+k,v,n is a locally stationary process with associated dependence measures δ ζu,v,2 (h) ≤ C(δ Q,4 (h) + δ Q,4 (h + k)) (A.9) for some universal constant C > 0 independent of u, v and any integer h; (ii) For any series of numbers a i , 1 ≤ i ≤ n, we have Proof. i) is a consequence of the Cauchy-Schwarz inequality, triangle inequality and conditions (M 1), By the property of martingale difference and i) of this lemma we have By triangle inequality, inequalities (A.11) and (A.12), the lemma follows. Corollary 1.1. Under conditions (C1), (M1) and (M2) we have for each fixed k > 0, 1 ≤ u, v ≤ p and 1 ≤ w ≤ d, ψ i =: ψ( i n , F i ) = e i+k,u,n e i,v,n , φ i =: φ( i n , F i ) = z i+k,w,n e i,v,n and ι i =: ι( i n , F i ) = e i+k,u,n z i,w,n are locally stationary processes with associated dependence measures max(δ ψ,2 (h), δ φ,2 (h), δ ι,2 (h)) ≤ C(δ Q,4 (h) + δ H,4 (h) + δ Q,4 (h + k) + δ H,4 (h + k)). (A.13) for some universal constant C > 0 independent of u, v and w. Proof. The corollary follows from the same proof of Lemma A.3. To save notation in the following proofs, for given J n , k writeM(J n , t, k) asM if no confusion arises. Recall the definition ofΣ x,j,k andM(J n , t, k) in (5.5) and (5.4) in the main article. Observe the following decompositionsΣ x,j,k = V 1,j,k + V 2,j,k + V 3,j,k + V 4,j,k , (A.14) Proof. Using equations (A.14)-(A.17) we have that , for s = 1, 2, 3, 4. Consider the s = 1 case and theñ Therefore it follows from the triangle inequality and Lemma A.3 that, for some sufficiently large constant C. Consequently by (C1) and Jansen's inequality, we get for 1 ≤ j ≤ J n . On the other hand, since (u, v) th element ofṼ 1 (t), which is denoted byṼ 1,u,v (t), satisfies Therefore by Jansen's inequality it follows that Proof. Consider the (u, v) th element of (EM(J n , t, k) − Σ * k (t)) u,v . By definition, we have for 1 ≤ u, v ≤ p, where Σ * k is defined in Lemma A.5. Proof. Notice that by definition we have that are both orthonormal bases for R p . Recall thatΓ = k 0 k=1Γ k and Γ = k 0 k=1 Γ k in Section 6 of the main article. Since we are testing the null hypothesis of static factor loadings and are considering local alternatives where A(·) deviates slightly from the null in this section, we have assumed that d(t) ≡ d for testing the static factor loadings in the main article, i.e., the number of factors is fixed, and therefore we assume that η n in conditions (C1), (S) satisfies η n ≥ η 0 > 0 for some positive constant η 0 , and T ηn = (0, 1) which coincides with discussions in Remark 7.3 of the main article. Proof. It suffices to show uniformly for 1 ≤ k ≤ k 0 , By the proof of Lemma A.4, it follows that for 1 ≤ k ≤ k 0 , By the proof of Lemma A.5, it follows that where to prove (B.5) we have used Lemma A.2. Then by (B.3) to (B.5) we have that which together with (S.2) in the main article and the definition of Γ k proves (B.2). Therefore the corollary holds. Corollary B.2. Assume conditions (A1), (A2), (C1), (M1), (M2), (M3) and conditions (S1)-(S4), then there exist orthogonal matricesÔ 3 ∈ R d×d ,Ô 4 ∈ R (p−d)×(p−d) such that Proof. By (S.2) in the main article and the triangle inequality we have that On the other hand, by Weyl's inequality (?) and (12) of Lam et al. (2011) we have that By (B.7), (B.8) and the proof of Theorem 7.2 we have that As a result, the corollary follows from Corollary B.1, (B.9) and Theorem 1 of Yu et al. (2015) . Details are omitted for the sake of brevity. Proof. Notice that by Corollary B.2, the exists a set of orthonormal basis G = (G 1 , ..., Take F = GÔ 4 and the corollary is proved. Recall in Section 7.2 of the main article the k 0 N n (p − d)p dimensional vectorsl i which replacesF in l i of (6.3) with F. In the following proofs, define s j,wn = j+wn−1 r=jl i for 1 ≤ j ≤ m n − w n + 1, and s mn = mn i=1l i . Notice that by conditions (S1) and (S3),l i , 1 ≤ i ≤ m n are mean-zero N n (p − d)p dimensional vectors. Proof of Theorem 7.5 Define υ n = 1 w n (m n − w n + 1) We shall show the following two assertions: 14) The theorem then follows from (B.13), (B.14) and Theorem 7.4. In the remaining proofs, we write conditional on {x i,n , 1 ≤ i ≤ n} as conditional on x i,n for short if no confusion arises. Step ( where σ υ u,v and σ Y u,v are the (u, v) th entry of the covariance matrix of υ n givenl i and covariance matrix of y. Notice that (B.15) together with condition (d) implies that there exists a constant η 0 > 0 such that Since by assumption w −1 n + w n /m n W will follow. Now we prove (B.15). Let S j,wn,s and S mn,s be the s th element of the vectors s j,wn and s mn , respectively. By our construction, we have (B.17) On one hand, following the proof of Lemma 4 of Zhou (2013) using conditions (i), (ii) and Lemma 5 of Zhou and Wu (2010) we have that Now using (i), (ii) and a similar argument to the proof of Lemma 1 of Zhou (2013), Cauchy-Schwarz inequality and the fact that max 1≤i≤mn |X i | q * ≤ mn i=1 |X i | q * for any random variables {X i } 1≤i≤mn , we obtain that Step(ii). We now show (B.14). It suffices to consider on the event {d n = d}. We first show that for After that we then show for ∈ (0, ∞) sup t∈R P(|υ n − t| ≤ |x i,n ) = O p ( log(n/ )). Combining (B.23) and (B.24), and following the argument of (S.32) to (S.34) in the main article, we have (N n p) 1/l ) l/(l+1) log −1/(2l+2) n, (B.14) follows. To show (B.23) it suffices to prove that We now show (B.26), and (B.27) follows mutatis mutandis. DefineŜ j,wn,r and S j,wn,r as the r th component of the k 0 N n (p − d)p dimensional vectorsŝ j,wn and s j,wn . Using the notation of proof of Theorem 7.4, it follows that x s 1 mn+i+w,n x s 1 mn+i,s 3 R j is a p dimensional Gaussian random vector for 1 ≤ s 1 ≤ N n − 1, 1 ≤ s 3 ≤ p, 1 ≤ w ≤ k 0 whose extreme probabilistic behavior depend on their covariance structure. Therefore we shall study mn−wn+1 j=1 j+wn−1 i=j x s 1 mn+i+w,n x s 1 mn+i,s 3 2 . For this purpose, for any random variable Y write (E(|Y | l |x i,n )) 1/l = Y L l ,x i,n then 3 Proof of Lemma 8.1 and Theorem 8.1 Proof of Lemma 8.1 Let f = (f 1 , ..., f p ) be a p dimensional vector such that f F = 1. Notice that by condition (f), there exists a constant M 1 such that On the other hand, by triangle inequality where the last equality is due to Cauchy-Schwartz inequality and Condition (M ). As a consequence, (C.1) and (C.2) lead to that Notice that when k = a log p for some sufficiently large constant a, √ pχ k = O(1). Therefore by (C.3), which finishes the proof. Recall the quantitiesľ i andl i defined in Section 8, andl i in (6.4) in the main article. To prove the power result in Theorem 8.1 we need to evaluate the magnitude of El i under the local alternatives, which are determined by El i and Eľ i . Define x * i = Az i,n + e i,n , and further definel * i by replacing x i,n inl i with x * i,n , where under the local alternative (8.3) in the main article, x i,n = A • (J + ρ n D(t))z i,n + e i,n . Notice that, under the considered alternative, y i in fact preserves the auto-covariance structure ofl * i . Then we prove Theorem 8.1 by the similar arguments to the proof of Theorems 7.4 and 7.5 but under local alternatives. We first present auxiliary propositions C.1 and C.2. Proposition C.1. Write Γ(0) as Γ. Under conditions of Theorem 8.1, we have that for each n there exists an orthogonal basis F = (f 1 , ..., f p−d ) such that F n − F F = O(ρ n p (δ−δ 1 )/2 ), (C.4) where F is a basis of Γ. Furthermore, Proof. By the definition of Γ and Γ n , it follows that Γ n − Γ F = O(ρ n p which shows (C.4). By the similar argument to Corollary B.1 and triangle inequality, we have that Γ − Γ F L 1 ≤ Γ − Γ n F L 1 + Γ n − Γ F L 1 = O((ρ n p (δ−δ 1 )/2 + p δ √ n )p 2−2δ ), (C.8) which further by the argument of (C.6) yield that (C.10) Therefore (C.5) holds. Proof. To show the first equation of (C.11), notice that for 1 ≤ s 1 ≤ N n −1, 1 ≤ s 2 ≤ p−d, 1 ≤ s 3 ≤ p, 1 ≤ w ≤ k 0 , the (k 0 (p−d)ps 1 +(w −1)p(p−d)+(s 2 −1)p+s 3 ) th entry of El i is E(f s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 ) where x i,s 2 is the (s 2 ) th entry of x i,n . Then by Cauchy-Schwarz inequality we have that there exists a constant M such that uniformly for 1 ≤ s 1 ≤ N n − 1, 1 ≤ s 2 ≤ p − d, 1 ≤ s 3 ≤ p, 1 ≤ w ≤ k 0 , |E(f s 2 x s 1 mn+i+w,n x s 1 mn+i,s 3 )| = |E(f s 2 (ρ n D(t i,w,n ) • Az s 1 mn+i+w,n + e s 1 mn+i+w,n )x s 1 mn+i,s 3 )| ≤ ρ n M p 1−δ 1 2 , (C.12) where t i,w,n = (s 1 m n + i + w)/n. In the above argument we have used the fact that f u F = 1 for 1 ≤ u ≤ p−d. Therefore the first equation of (C.11) follows from (C.12). Write h s 2 = f s 2 − f s 2 ,n . Similarly to (C.12), there exists a constant M such that uniformly for 1 ≤ s 1 ≤ N n − 1, 1 ≤ s 2 ≤ p − d, 1 ≤ s 3 ≤ p, 1 ≤ w ≤ k 0 , Definẽ Using Proposition C.1 and following the proof of Proposition 7.3 we have that for any sequence g n → ∞, P(|T n −T n | ≥ g 2 n (ρ n p (δ−δ 1 )/2 + p δ √ n )m 1/2 n Ω n ) = O( 1 g n + p δ √ n log n). (C.14) One the other hand, similarly to the proof of Proposition 7.3, we have that on the event of {d n = d}, |T n −T * n | ≤ max 1≤s 1 ≤Nn−1, 1≤s 3 ≤p, 1≤w≤k 0 ,1≤s 2 ≤p−d mn i=1 |f s 2 (x s 1 mn+i+w,n x s 1 mn+i,s 3 − x * s 1 mn+i+w,n x * s 1 mn+i,s 3 )|/ √ m n ≤ max 1≤s 1 ≤Nn−1, 1≤s 3 ≤p, 1≤w≤k 0 | p u=1 ( mn i=1 (x s 1 mn+i+w,u x s 1 mn+i,s 3 − x * s 1 mn+i+w,u x * s 1 mn+i,s 3 )) 2 | 1/2 , (C.15) where x * i,j means the j th entry of x * i,n . Notice that | p u=1 ( mn i=1 (x s 1 mn+i+w,u x s 1 mn+i,s 3 − x * s 1 mn+i+w,u x * s 1 mn+i,s 3 )) 2 | 1/2 x s 1 mn+i+w,u x s 1 mn+i,s 3 − x * s 1 mn+i+w,u x * s 1 mn+i,s 3 ) 2 1/2 L l/2 , (C.16) which can be further bounded by the summation of 2 p u=1 ( mn i=1 x s 1 mn+i+w,u (x s 1 mn+i,s 3 − x * s 1 mn+i,s 3 )) 2 1/2 L l/2 and 2 p u=1 ( mn i=1 x * s 1 mn+i,s 3 (x s 1 mn+i+w,u − x * s 1 mn+i+w,u ) 2 1/2 L l/2 . Therefore, using the definition of x * i,n , straightforward calculations show that | p u=1 ( mn i=1 (x s 1 mn+i+w,u x s 1 mn+i,s 3 − x * s 1 mn+i+w,u x * s 1 mn+i,s 3 )) 2 | 1/2 L l √ pm n ρ n . (C.17) As a consequence, by the proof of Proposition 7.3, for any sequence g n → ∞, P(|T * n −T n | ≥ g n ρ n m 1/2 n Ω n ) = O( 1 g n + p δ √ n log n). Notice that y i preserves the autocovariance structure ofl * i . Then it follows from (C.14), (C.17) and the proof of Theorem 7.4 that under the alternative hypothesis, sup t∈R |P(T n ≤ t) − P(|y| ∞ ≤ t)| = O (ρ n p δ−δ 1 2 + p δ √ n ) m n log nΩ n 1/3 + ρ n m n log nΩ n 1/2 + ι(m n , k 0 N n p(p − d), 2l, (N n p 2 ) 1/l , Notice that for 1 ≤ s 1 ≤ N n − 1, 1 ≤ s 2 ≤ p − d, 1 ≤ s 3 ≤ p, 1 ≤ w ≤ k 0 , the (k 0 (p − d)ps 1 + (w − 1)p(p − d) + (s 2 − 1)p + s 3 ) th entry ofľ i is f s 2 ,n x s 1 mn+i+w,n x s 1 mn+i,s 3 where x i,s 2 is the (s 2 ) th entry of x i,n . It is easy to see that f s 2 ,n x s 1 mn+i+w,n x s 1 mn+i,s 3 |, (C.21) where by the definition of and the conditions on ∆ n , the indices s 1 , s 2 , s 3 , w can be chosen such that 1 √ m n mn i=1 E(f s 2 ,n x s 1 mn+i+w,n x s 1 mn+i,s 3 ) ( (f s 2 ,n x s 1 mn+i+w,n x s 1 mn+i,s 3 ) = 1 m n i 1 ,i 2 Cov(f s 2 ,n x s 1 mn+i 1 +w,n x s 1 mn+i 1 ,s 3 , f s 2 ,n x s 1 mn+i 2 +w,n x s 1 mn+i 2 ,s 3 ). (C.23) Consider the case when i 1 > i 2 . Now using condition (G) and that x s 1 mn+i 1 +w,n = A( s 1 m n + i 1 + w n )z s 1 mn+i 1 +w,n + e s 1 mn+i 1 +w,n , x s 1 mn+i 2 +w,n = A( s 1 m n + i 2 + w n )z s 1 mn+i 2 +w,n + e s 1 mn+i 2 +w,n , we have (let i 1 = s 1 m n + i 1 , i 2 = s 1 m n + i 2 for short) Cov(f s 2 ,n x i 1 +w x i 1 ,s 3 , f s 2 ,n x i 2 +w x i 2 ,s 3 ) = Cov(f s 2 ,n A( i 2 + w n )z i 1 +w x i 1 ,s 3 , f s 2 ,n A( i 2 + w n )z i 2 +w x i 2 ,s 3 ) + Cov(f s 2 ,n A( i 2 + w n )z i 1 +w x i 1 ,s 3 , f s 2 ,n e i 2 +w x i 2 ,s 3 ). (C.24) Notice that (C.4) and condition (C1) lead to f s 2 ,n A( k n ) F = O(ρ n p 1−δ 1 2 ) (C.25) uniformly for 1 ≤ s 2 ≤ p − d and 1 ≤ k ≤ n. Therefore, Cov(f s 2 ,n A( i 2 + w n )z i 1 +w x i 1 ,s 3 , f s 2 ,n A( i 2 + w n )z i 2 +w x i 2 ,s 3 ) = f s 2 ,n A( i 2 + w n )Cov(z i 1 +w x i 1 ,s 3 , z i 2 +w x i 2 ,s 3 )(f s 2 ,n A( i 2 + w n )) F ≤ ρ 2 n p 1−δ 1 Cov(z i 1 +w x i 1 ,s 3 , z i 2 +w x i 2 ,s 3 ) F . (C.26) On the other hand using condition (G2) and condition (ii) of Theorem 7.5, by the proof of Lemma A.3 it follows for 1 ≤ u ≤ d, ≤ v ≤ p, Υ i,u,v =: Υ i,u,v ( i n , F i ) = z i,u,n x i+k,v,n is a locally stationary process with associated dependence measures δ Υu,v,2 (h) = O(((k + 1) log(k + 1)) −2 ). (C.27) Therefore by Lemma 5 of Zhou and Wu (2010) , uniformly for 1 ≤ i 1 , i 2 ≤ m n and 1 ≤ s 3 ≤ p, Cov(z i 1 +w x i 1 ,s 3 , z i 2 +w x i 2 ,s 3 ) F = O((|i 2 − i 1 | log(|i 2 − i 1 |) −2 ). (C.28) Together with (C.26) we have 1 m n i 1 ,i 2 Cov(f s 2 ,n A( i 2 + w n )z i 1 +w x i 1 , f s 2 ,n A( i 2 + w n )z i 2 +w x i 2 ) = O(ρ 2 n p 1−δ 1 ∨ 1). (C.29) By similarly arguments using the proof of Lemma 5 of Zhou and Wu (2010) and condition (G3) we have 1 m n i 1 ,i 2 Cov(f s 2 ,n A( i 2 + w n )z i 1 +w x i 1 , f s 2 ,n e i 2 +w x i 2 ) = O(ρ n p 1−δ 1 2 log 2 p ∨ 1) (C.30) By (C.21)-(C.23), (C.29) and (C.30) we showT n is diverging at the rate of √ mρ n p 1−δ 1 2 , which proves (a). To show (b), we define (š j,wn − Eš j,wn − w n m n (š n − Eš n )) •2 ) • 1 2 | ∞ + M 1 |( mn−wn+1 j=1 (Eš j,wn − w n m n Eš n ) •2 ) • 1 2 | ∞ := I + II + III for some large positive constant M 1 , where I, II and III are defined in an obvious way. For I, notice that similarly to the proof of Theorem 7.5, using Proposition C.1 we have I/ w n (m n − w n + 1) = O p ( p δ √ n √ w n Ω n ). Therefore from (C.39) to (C.43), |κ n | ∞ = O p ( √ w n ρ n p 1−δ 1 2 √ log n), which completes the proof. The use of factor analysis in the statistical analysis of multiple time series Inferential theory for factor models of large dimensions Determining the number of factors in approximate factor models Simultaneous multiple change-point and factor analysis for high-dimensional time series Time-varying general dynamic factor models and the measurement of financial connectedness Testing for structural breaks in dynamic factor models Time series: data analysis and theory Large sample sieve estimation of semi-nonparametric models Comparison and anti-concentration bounds for maxima of gaussian random vectors Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Fitting time series models to nonstationary processes Locally stationary processes Fitting dynamic factor models to non-stationary time series A dynamic semiparametric factor modelfor implied volatility string dynamics The generalized dynamic-factor model: Identification and estimation Abstract inference Nonparametric sieve regression: Least squares, averaging least squares, and crossvalidation. Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics Factor modeling for high-dimensional time series: inference for the number of factors Estimation of latent factors for high-dimensional time series Estimation of large dimensional factor models with an unknown number of breaks Locally stationary factor models: Identification and nonparametric estimation Identifying a simplifying structure in time series Dynamic factor models Forecasting using principal components from a large number of predictors On time-varying factor models: Estimation and testing Multivariate time series analysis: with R and financial applications Factor models for matrix-valued high-dimensional time series On the convergence rates of legendre approximation Multivariate Time Series Analysis and Applications Testing for factor loading structural change under common breaks A useful variant of the davis-kahan theorem for statisticians Gaussian approximation for high dimensional vector under physical dependence Measuring nonlinear dependence in time-series, a distance correlation approach Heteroscedasticity and autocorrelation robust structural change detection Local linear quantile estimation for nonstationary time series Simultaneous inference of linear models with time varying coefficients Then we define υ * n = 1 w n (m n − w n + 1) mn−wn+1 j=1 (s * j,wn − w n m n s * mn )R j , (C.33)Then by the proof of Theorem 7.5 we have thatStraightforward calculations using similar arguments to the proofs of step (ii) of Theorem 7.5 and equation(C.5) further show that under the considered local alternative hypothesis,As a result, by the arguments of (B.25), (8.5) in the main article follows.To evaluate the magnitude of order of |κ n | ∞ , notice that κ n is a Gaussian vector given {x i,n , 1 ≤ i ≤ n}.Therefore almost surelyIn the following we consider the event {d n = d}. Observe that(ŝ j,wn −š j,wn − w n m n (ŝ n −š n )) •2 ) • 1 2 | ∞ (C.40)