key: cord-0157299-0rm1uuxh
authors: Duan, Jiangtao; Bai, Jushan; Han, Xu
title: Quasi-maximum likelihood estimation of break point in high-dimensional factor models
date: 2021-02-25
journal: nan
DOI: nan
sha: 1b0911042b47f262febf3c108a13edb263b74c2d
doc_id: 157299
cord_uid: 0rm1uuxh

This paper estimates the break point for large-dimensional factor models with a single structural break in factor loadings at a common unknown date. First, we propose a quasi-maximum likelihood (QML) estimator of the change point based on the second moments of factors, which are estimated by principal component analysis. We show that the QML estimator performs consistently when the covariance matrix of the pre- or post-break factor loading, or both, is singular. When the loading matrix undergoes a rotational type of change while the number of factors remains constant over time, the QML estimator incurs a stochastically bounded estimation error. In this case, we establish an asymptotic distribution of the QML estimator. The simulation results validate the feasibility of this estimator when used in finite samples. In addition, we demonstrate empirical applications of the proposed method by applying it to estimate the break points in a U.S. macroeconomic dataset and a stock return dataset.

Large factor models assume that a few factors can capture the common driving forces of a large number of economic variables. Although factor models are useful, practitioners have to be cautious about the potential structural changes. For example, either the number of factors or the factor loadings may change over time. This concern is empirically relevant because parameter instability is pervasive in large-scale panel data.

So far, many methods have been developed to test structural breaks in factor models (e.g., Stock and Watson (2008) , Breitung and Eickmeier (2011) , and Chen et al. (2014) ). The rejection of the null hypothesis of no structural change leads to the subsequent issues of how to estimate the change point, determine the numbers of pre-and post-break factors, and estimate the factor space. Chen (2015) considers a least-squares estimator of the break point and proves the consistency of the estimated break fraction (i.e., the break date k divided by the full time series T , k T ). Cheng et al. (2016) propose a shrinkage method to obtain a consistent estimator of the break fraction. Baltagi et al. (2017) develop a least-squares estimator of the change point based on the second moments of the estimated pseudo-factors and show that the estimation error of the proposed estimator is Op(1), which indicates the consistency of the estimated break fraction. A few recent studies also explore a consistent estimation of break points, which is technically more challenging. Ma and Su (2018) develop an adaptive fused group Lasso method to consistently estimate all break points under a multibreak setup. Barigozzi et al. (2018) propose a method based on wavelet transformations to consistently estimate the number and locations of break points in the common and idiosyncratic components. Bai et al. (2020) establish the consistency of the least-squares estimator of the break point in large factor models when factor loadings are subjected to a structural break and the size of the break is shrinking as the sample size increases. Although the estimators proposed in these studies are consistent under certain assumptions, the simulation results show that they perform poorly when (1) the number of factors changes after the break or (2) the loading matrix undergoes a rotational type of change.

According to the factor model literature, a factor model with a break in factor loadings is observationally equivalent to that with constant loadings and possibly more pseudo-factors (e.g., Han and Inoue (2015) and Bai and Han (2016) ). Thus, the estimation of the change point of factor loadings can be converted into that of the change point of the second moment of the pseudo-factors. We propose a quasi-maximum likelihood (QML) method to estimate the break point based on the second moment of the estimated pseudo-factors; therefore, the number of original factors is not required to be known for computing our estimator. First, we estimate the number of pseudo-factors in an equivalent representation that ignores the break, and then estimate the pre-and post-break second moment matrices of the estimated pseudo-factors for all possible sample splits. The structural break date is estimated by minimizing the QML function among all possible split points. This paper makes the following contributions to the literature. First, we establish the consistency of the QML break point estimator if the break leads to more pseudo-factors than the original pre-or post-break factors. This occurs when the break augments the factor space or in the presence of disappearing or emerging factors. Under these circumstances, the covariance matrix of loadings on the pre-or post-break pseudo-factors is singular, which is the key condition to establish the consistency of our QML estimator. To the best of our knowledge, this is the first study that links the consistency of the break point estimator to the singularity of covariance matrices of loadings on pre-and post-break pseudo-factors. In addition, we prove that the difference between the estimated and true change points is stochastically bounded when both pre-and post-break loadings on the pseudo-factors have nonsingular covariance matrices. In this case, the loading matrix only undergoes a rotational change, and both the numbers of pre-and post-break original factors are equal to the number of pseudo-factors.

The aforementioned singularity leads to a technical challenge of analyzing the asymptotic property. The singular population covariance matrix of the pre(post)-break loadings has a zero determinant, whose logarithm is not defined appropriately.

To resolve this issue, we show that the estimated covariance matrices have nonzero determinants and a well-defined inverse for any given sample size, by obtaining the convergence rate of the lower bound of their smallest eigenvalues. This ensures that the objective function based on the estimated covariance is appropriately defined in any finite sample.

Our second major contribution is that the QML method allows a change in the number of factors. Namely, it allows for disappearing or emerging factors after the break. This is an advantage over the methods developed by Ma and Su (2018) and Bai et al. (2020) , who assume that the number of factors remains constant after the break. Our simulation result indicates that the estimator proposed by Bai et al. (2020) is inconsistent when some factors disappear and the remaining factors have time-invariant loadings. Baltagi et al. (2017) allow a change in the number of factors; however, their estimation error was only stochastically bounded. In contrast, our QML estimator remains consistent under varying number of factors.

Finally, the QML method has a substantial computational advantage over the estimators that iteratively implement high-dimensional principal component analysis (PCA) . For example, the estimator proposed by Bai et al. (2020) runs PCA for pre-and post-split sample covariance matrices for all possible split points. In comparison, our QML runs PCA for the entire sample only once, and thus, is computationally more efficient, especially in large samples.

The rest of this paper is organized as follows. Section 2 introduces the factor model with a single break on the factor loading matrix and describes the QML estimator for the break date. Section 3 presents the assumptions made for this model. Section 4 presents the consistency and asymptotic distribution of the QLM estimator for the break date. Section 5 investigates the finite-sample properties of the QML estimator through simulations. Section 6 implements the proposed method to estimate the break points in a monthly macroeconomic dataset of the United States and a dataset of weekly stock returns of Nasdaq 100 components. Section 7 concludes the study.

Let us consider the following factor model with a common break at k0 in the factor loadings for i = 1, · · · , N :

λi1ft + eit f or t = 1, 2, · · · , k0(T )

λi2ft + eit f or t = k0(T ) + 1, · · · , T,

where ft is an r−dimensional vector of unobserved common factors; r is the number of pseudo-factors; k0(T ) is the unknown break date; λi1 and λi2 are the pre-and post-break factor loadings, respectively; and eit is the error term allowed to have serial and cross-sectional dependence as well as heteroskedasticity. τ0 ∈ (0, 1) is a fixed constant and [x] represents the integer part of x. For notational simplicity, hereinafter, we suppress the dependence of k0 on T . Note that we formulate the model using pseudo-factors instead of the original underlying factors. This simplifies the representation of various breaks in a unified framework, which will be clarified in the examples below.

In vector form, model (1) can be expressed as xt =      Λ1ft + et f or t = 1, 2, · · · , k0

Λ2ft + et f or t = k0 + 1, · · · , T,

where xt = [x1t, · · · , xNt] ′ , et = [e1t, · · · , eNt] ′ , Λ1 = [λ11, · · · , λN1] ′ , and Λ2 = [λ12, · · · , λN2] ′ .

For any k = 1, · · · , T − 1, we define

where the subscript k denotes the date at which the sample is to be split, and the superscripts (1) and (2) denote the pre-and post-k data, respectively. We rewrite (2) using the following matrix representation:

where Λ is an N × r matrix with full column rank. The pre-and post-break loadings are modeled as Λ1 = ΛB and Λ2 = ΛC, respectively, where B and C are some r × r matrices. In this model, r1 = rank(B) ≤ r and r2 = rank(C) ≤ r denote the numbers of original factors before and after the break, respectively. To distinguish them from the original factors, we refer to G as the pseudo-factors in (3) and rank(G) = r. Hence, the last line of (3) provides an observationally equivalent representation with constant loadings Λ and r pseudo-factors G. It is well known that the break can augment the factor space; thus, r1 ≤ r and r2 ≤ r. F

(1)

k 0 have dimensions k0 × r and (T − k0) × r, respectively, and Λ1 and Λ2 have dimension N × r. Our representation in (3) allows for changes in the factor loadings and the number of factors. Below, several examples are provided to illustrate that the pseudo-factor representation in (3) is general enough to cover three types of breaks. Type 1. Both B and C are singular. In this case, the number of original factors is strictly less than that of the pseudo-factors both before and after the break (i.e., r1 < r and r2 < r). This means that the structural break in the factor loadings augments the dimension of the factor space. Let us consider the following example.

(1) k 0 (k0 × r1) and F

(2) k 0 ((T − k0) × r2) denote the original factors before and after the break, respectively, and Θ1 and Θ2 denote the pre-and post-break loadings on these factors. Thus, this model can be represented and transformed

where Λ = [Θ1, Θ2], B = diag(Ir 1 , 0r 2 ×r 2 ), C = diag(0r 1 ×r 1 , Ir 2 ), F

k 0 ], and the asterisk denotes some unidentified numbers such that all rows in F (1) k 0 and F (2) k 0 have the same variance (to satisfy Assumption 1 in Section 3). In the special case of r1 = r2, Λ is of full rank 2r1 (i.e., the dimension of the pseudo-factor space is twice that of the original factor space) if the shift in the loading matrix Θ2 − Θ1 is linearly independent of Θ1. We refer to this special case as the shift type of change, because the augmentation of the factor space is induced by a linearly independent shift in the loading matrix. Hence, Type 1 covers the shift type of change.

Type 2. Only B or C is singular. In this case, emerging or disappearing factors are present in the model. Let us consider the following example of disappearing factors. Example (2): Without loss of generality, let us assume that r2 < r1 and Θ2 is equal to the first r2 columns of Θ1; thus, the last r1 − r2 factors disappear after the break. Therefore, we can obtain the pseudo-factors by using the following transformation from the original factors F:

where F

(1)

. . . * ], C = diag(Ir 2 , 0 (r 1 −r 2 )×(r 1 −r 2 ) ), Λ = Θ1, and the asterisk is defined in a similar manner to that in (4). In this example, B = Ir 1 , r = r1, and r2 = rank(C) < r. Symmetrically, if B is singular and C = Ir 2 , then r2 = r and r1 = rank(B) < r, which means that certain factors emerge after the break point. Type 2 changes are important in empirical analysis. Please refer to Mcalinn et al. (2018) for empirical evidence regarding the varying number of factors in the U.S. macroeconomic dataset. For Types 1 and 2, we obtain a significant result that P (k−k0 = 0) → 1 as N, T → ∞. 1 Type 3. Both B and C are nonsingular. In this case, the loadings on the original factors undergo a rotational change, and the dimension of the original factors is the same as that of the pseudo-factors.

Example (3): Let us assume that r2 = r1 and Θ2 = Θ1C for a nonsingular matrix C. The model with the original factors F can be transformed into the following pseudo-factor representation:

where F

(1)

k 0 , and Λ = Θ1. In this example, B = Ir 1 and r = r1 = r2, and the factor dimension remains constant. In the observationally equivalent pseudo-factor representation, the loading is time-invariant and the original postbreak factors F

(2) k 0 are rotated by C. We refer to this as the rotation type of change.

The above examples show that a factor model with any of these three types of change can be unified and reformulated by the representation in (3) with pseudo-factors. This representation controls the break type by varying the settings for B and C, and thus, is convenient for our theoretical analysis. Bai et al. (2020) rule out the rotation type of change because the break date is not identifiable by minimizing the sum of squared residuals. Baltagi et al. (2017) allow changes in the number of factors and rotation type of change; however, the difference between their estimator and the true break point is only stochastically bounded (i.e., their estimator is not consistent). Ma and Su's (2018) setup requires r1 = r2; thus, Type 2 is ruled out under their assumptions. Our simulation result shows that Ma and Su's estimator does not perform well under rotational changes (Type 3), whereas our QML method can handle changes in all three types discussed above. We obtain a significant result thatk − k0 = Op(1) if both B and C are of full rank (i.e., Type 3) andk − k0 = op(1) if B or C, or both, is singular (i.e., Type 1 and Type 2).

In this paper, we consider the QML estimator of the break date for model (3):

where [τ1T ] and [τ2T ] denote the prior lower and upper bounds for the real break point k0 with τ1, τ2 ∈ (0, 1) and τ1 ≤ τ0 ≤ τ2.

The QML objective function UNT (k) is equal to

whereΣ1 andΣ2 can be defined asΣ

andĝt is the PCA estimator of gt (i.e., the transpose of the t-th row of G). We define ΣG,

where V and Φ are the eigenvalue and eigenvector matrices of Σ 1/2 Λ ΣGΣ 1/2 Λ , respectively. Evidently, the second moment of H0gt shares the same change point as that of gt. Therefore, we proceed to estimate the pre-and post-break second moments of gt by using the estimated factorsĝt, and then use (7) to obtain the QML break point estimatorkQML. Similar QML objective functions have been used for multivariate time series with observed data (e.g., Bai (2000) ).

In this section, we state the assumptions made for validating the consistency and asymptotic distribution of the QML estimator.

Assumption 3. There exists a positive constant M < ∞ such that (i) E(eit) = 0 and E|eit| 8 ≤ M for all i = 1, · · · , N and t = 1, · · · , T ;

T s=1 |γN (s, t)| ≤ M for every t ≤ T ;

(iii) E(eitejt) = τij,t with |τij,t| < τij for some τij and for all t = 1, · · · , T and N j=1 |τij | ≤ M for every i ≤ N ;

(iv) E(eitejs) = τij,ts,

|τij,ts| ≤ M ;

(v) For every (s, t) ,

Assumption 4. There exists a positive constant M < ∞ such that

Assumption 5. The eigenvalues of ΣGΣ Λ are distinct.

Assumption 6. Let us define ǫt = ftf ′ t − ΣF . According to the data-generating process (DGP) of factors, the Hájek-Rényi inequality applies to the processes {ǫt, t = 1, · · · , k0}, {ǫt, t = k0, · · · , 1}, {ǫt, t = k0 + 1, · · · , T }, and {ǫt, t = T, · · · , k0 + 1}.

Using the Hájek-Rényi equality on ǫt, we can ensure that max (1) 

Assumption 8. There exists an M < ∞ such that for all values of N and T ,

In this section, we derive the asymptotic properties of the QML estimator for various breaks. In the literature of structural breaks for a fixed-dimensional time series, conventional break point estimators, such as the least-squares (LS) estimator of Bai (1997) or the QML estimator of Qu and Perron (2007) , are usually inconsistent. The estimation error of these conventional estimators is Op(1) when the break size is fixed. To reach consistency, the cross-sectional dimension of the time series must be large (e.g., Bai (2010) and Kim (2011) ).

Recall that the observationally equivalent representation in (3) has time-invariant loadings and varying pseudo-factors.

Hence, our problem converges to estimating the break point in the r-dimensional time series gt, where r is fixed. Theorems 1 and 2 below show that, for rotational breaks (Type 3), the convergence rate and limiting distribution are similar to those available in the literature. However, for Type 1 and 2 breaks, Theorem 3 derives a much more significant result than that available in the literature, according to which our QML estimator is consistent even if our gt has only a fixed cross-sectional dimension r. 

This result shows that the limiting distribution depends on ξt. If ξt is independent over time, then W (ℓ) is a two-sided random walk. If ft is stationary, then ξt is stationary in each regime. Here, the limiting distribution of the estimated break date is dependent on the generation processes of the unobserved factors, and thus, cannot be directly used to construct a confidence interval for a true break point. Bai et al. (2020) propose a bootstrap method to construct a confidence interval for k0 when the change in the factor loading matrix shrinks as N → ∞. However, their bootstrap procedure lacks robustness in the cross-sectional correlation in the error terms. In the current setup, the break magnitude Σ2 − Σ1 is fixed and we leave the case of shrinking break magnitude as a future topic.

Next, we establish a much stronger result than that available in the literature, which states that the QML estimator remains consistent when B or C, or both, is singular. We make the following additional assumptions.

Assumption 9. With probability approaching one (w.p.a.1), the following inequalities hold:

as N, T → ∞, where c and c are some constants.

Assumption 9 is useful to derive the lower bound of the smallest eigenvalue ofΣ1 (orΣ2) if B (or C) is a singular matrix.

Thus, Assumption 10 is similar to Assumption F2 of Bai (2003) . As the log determinant of the matrix is involved in the QML function, a natural problem is that the log determinant of a singular matrix is infinity when B or C, or both, is singular.

Note that k t=1

in Assumptions 9-10 involve a positive fraction of observations over time since the low bound of k is τ1T , where τ1 ∈ (0, 1). When Σ1 and Σ2 are singular matrices, the determinants ofΣ1 = 1

but not equal to zero in finite samples. The following proposition develops a lower bound for the smallest eigenvalues ofΣ1

andΣ2.

In proposition 1, the lower bound of the smallest eigenvalues of the estimated sample covariance matricesΣ1 andΣ2 is cL/N for a constant cL > 0 w.p.a.1. This ensures a lower bound for the determinants of the estimated sample covariance matrices. Proposition 1 provides a useful tool to establish the consistency of our QML estimator. Although this technical result is a byproduct in our analysis, we believe that it is of independent interest and useful in other contexts.

and 1 < τ 0 < 1, [B, C] row full rank such that Σ G is a positive definite matrix.

Assumption 11(i) implies that ΣG is positive definite, and B # C = 0 when r − r1 = 1, and C # B = 0 when r − r2 = 1.

Assumption 11(ii) is to exclude the possibility that f k 0 and f k 0 +1 in the null space of C # B and B # C, respectively, then the specific low bound of |Σ2(k)| with respect to k0 − k can be obtained when k < k0 and k0 − k is bounded as N, T → ∞.

Similarly, Assumption 11(iii) also exclude the possibility that Bf k 0 in the column space of C when r − r2 ≥ 2 or r2 = 0 and Cf k 0 +1 in the column space of B when r − r1 ≥ 2 or r1 = 0, then the specific low bound of |Σ2(k)| with respect to k0 − k can be obtained when k < k0 and k0 − k is divergent as N, T → ∞. Assumption 11 was used to establish Lemma 8, which is useful for validating the consistency result that P rob(k − k = 0) → 1 in the proof of Theorem 3. If the factor f k 0 and f k 0 +1

have continuous probability distribution functions, then Assumption 11(ii)-(iii) are to exclude a zero probability event since C # B and B # C are not equal to 0. From another perspective, Assumption 11(ii)-(iii) allow ft have various data generating process.

Theorem 3 shows that the estimated change point converges to the true change point w.p.a.1 when B or C, or both, is singular (Types 1 and 2 in Section 2). This result is much more significant than that obtained by Baltagi et al. (2017) , who

show that the distance between the estimated and true break dates is bounded for Types 1-3. Note that the case in which only B (or C) is singular corresponds to Type 2 with emerging (or disappearing) factors. Our QML estimator is consistent under this type of change, whereas Bai et al. (2020) and Ma and Su (2018) rule out this type by assumption. In empirical applications, the conditions of theorem 3 are rather flexible and likely to hold and the consistency of the break date estimator is expected in most economic data for the factor analysis.

of the covariance matrices of the pre-or post-break factor loadings. The conditions that B or C, or both, is singular and N T → κ ∈ (0, ∞) are likely to hold in many economic datasets for factor analysis. If both B and C are singular, the break occurs such that the number of pseudo-factors in the entire factor model is larger than that of the factors in the pre-and post-break subsamples. For example, if all factors undergo large breaks in their loadings, the number of factors tends to be doubled (see Breitung and Eickmeier (2011) ). If B is of full rank and C is singular, some factors become useless, and thus, the loading coefficients attached to these disappearing factors become zero. For example, in the momentum portfolio, some risks are not part of the firm's long-run structure as the sorting is only based on recent returns works; the reward is high but disappears within less than a year. If B is singular and C is of full rank, some factors emerge after the break date, which increases the dimension of the post-break factor space. For example, changes in the technology or policy may produce certain new factors.

UNT (k) − UNT (k0) is always larger than zero, even if k deviates only slightly from the true break point k0, so thatk must be equal to k0 to minimize UNT (k) − UNT (k0). For example, in Type 1, when both B and C are singular for k < k0, we can Remark 5. With the QML estimator, we do not need to know the numbers of original factors r1 and r2 before and after the break point, but only the number of pseudo-factors in the entire sample. Bai et al. (2020) and Ma and Su (2018) require knowledge of the number of original factors, which is much more difficult to estimate due to the augmented factor space resulting from the break. In practice, the number of pseudo-factors is much easier to estimate by using one of a number of estimators, such as the information criteria developed by Bai and Ng (2002) .

In this section, we consider DGPs corresponding to Types 1-3 to evaluate the finite sample performance of the QML estimator. We compare the QML estimator with three other estimators. As shown below,kBKW is the estimator proposed by Baltagi, Kao, and Wang (2017, BKW hereafter) ;kBHS is the estimator proposed by Bai, Han, and Shi (2020, BHS hereafter) ; kMS is the estimator proposed by Ma and Su (2018, MS hereafter) ; andkQML is the QML estimator. Barigozzi et al. (2018) develops a change point estimator using wavelet transformation, which exhibits similar performance to that of the estimator proposed by Ma and Su (2018) . Hence, the comparison with the estimator proposed by Barigozzi et al. (2018) is not reported here, but the result is available upon request. The DGP roughly follows BKW, which can be used to examine various elements that may affect the finite sample performance of the estimators, and we use this DGP for model (3). We calculate the root mean square error (RMSE) and mean absolute error (MAE) of these change point estimatorskBKW ,kBHS, andkQML, and each experiment is repeated 1000 times, where RMSE= We generate factors and idiosyncratic errors using a DGP similar to that of BKW. Each factor is generated by the following AR(1) process:

The scalar ρ captures the serial correlation of factors, and the idiosyncratic errors are generated by

(1−α 2 )Ω ). The scalar α captures the serial correlation of the idiosyncratic errors, and Ω is generated as Ωij = β |i−j| so that β captures the degree of cross-sectional dependence of the idiosyncratic errors. In addition, ut and vt are mutually independent for all values of t. We set r0 = 3 and k0 = T /2. We consider the following DGPs for factor loadings and investigate the performance of the QML estimator for the three types of breaks discussed in Section 2. DGP 1.A We first consider the case in which C is singular, and set C = [1, 0, 0; 0, 1, 0; 0, 0, 0]. This setup aims to model (5). In the pre-break regime, all elements of λi,1 are i.i.d. N (0, 1 r 2 0 Ir 0 ) across i. In the post-break regime, Λ2 = (λ1,2, · · · , λN,2) ′ = Λ1C. This case corresponds to a Type 2 change with a disappearing factor. The number of pseudofactors is the same as r0, so r = 3, and the numbers of pre-and post-break factors are 3 and rank(C) = 2, respectively. Table   1 lists the RMSEs and MAEs of three estimators for different values of (ρ, α, β). In all cases,kQML has much smaller MAEs and RMSEs thankBKW andkBHS. Moreover, the MAEs and RMSEs ofkQML tend to decrease as N and T increase. This confirms the consistency ofkQML established in Theorem 3. In addition, the RMSEs and MAEs ofkBKW do not converge to zero as N and T increase, which confirms thatkBKW has a stochastically bounded estimation error.kBHS does not appear to be consistent when a factor disappears after the break. Moreover, a larger AR(1) coefficient ρ tends to deteriorate the performance ofkBKW , but does not have much impact on our QML estimator.

We next consider the case in which C is of full rank. We set C as a lower triangular matrix. The diagonal elements are equal to 0.5, 1.5, and 2.5, and the elements below these diagonal elements are i.i.d. and drawn from a standard normal distribution. Under this DGP, we have r = r0. DGP 1.C In this case, we set C = [1, 0, 0; 2, 1, 0; 3, 2, m] and m ∈ {1, 0.8, 0.5, 0.1, 0}. As m decreases to zero, the matrix C changes from full rank to singular. We still consider serial correlation in factors and serial correlation and cross-sectional dependence in idiosyncratic errors simultaneously with N = 100, T = 100. Table 3 shows that the MAEs and RMSEs of kQML decrease with m, which confirms our findings in Theorems 1 and 3. In addition, the RMSEs and MAEs ofkBKW and kBHS are much larger than those ofkQML, and do not tend toward zero as m decreases. For each value of m, the experiment is repeated 10000 times to more accurately estimate and compare the RMSEs (MAEs) of our QML estimator across different values of m.

DGP 1.D This DGP considers a Type 1 break. In the first regime, the last elements of λi,1 are zeros for all i, and the first two elements of λi,1 are both i.i.d. N (0, 1 2 Ir 0 ). In the second regime, λi,2 is i.i.d. N (0, 1 3 Ir 0 ) across i. As λi,1 and λi,2 are independent, the numbers of factors in the two regimes are r1 = 2andr2 = 3, respectively, and the number of pseudo-factors is r = 5. Because the numbers of pre-or post-break factors are smaller than that of the pseudo-factors, both Σ1 and Σ2 are singular matrices. Table 4 reports the MAEs and RMSEs ofkQML,kBHS, andkBKW under this DGP. Table 4 shows the suitable performances of bothkBHS and ourkQML. Their MAEs (RMSEs) are less than 0.05 (0.25) for all combinations of N , T , ρ, α, and β. AlthoughkBHS is consistent under this DGP, our QML estimator still has smaller RMSEs thankBHS in most cases reported in Table 4 . In addition,kBKW performs better under this DGP than DGPs 1.A-1.C. However, its estimation error is much larger than that of our QML estimator. This is not surprising becausekBKW is not consistent.

Finally, a larger AR(1) coefficient ρ tends to yield a larger bias forkBKW , but does not have much effect on the performances ofkBHS andkQML.

In summary, Tables 1 and 2 show that the QML estimator performs much better thankBHS under Type 2 and 3 breaks, which are ruled out under the assumptions of Bai et al. (2020) . Table 4 shows that the QML estimator tends to slightly outperformkBHS, even though the latter is known to be consistent under Type 1 breaks. BHS method is super good for Type 1 changes with smaller breaks. QML method will lose its power when breaks are small like in BHS's settings in their paper, because the dimension of G (determined by IC criterion in Bai and Ng (2002) ) will not be augmented when breaks are small, which means the singularity does not show up in the covariance if breaks are small enough. Tables 1-4 : the QML estimatorkQML can detect the true break date with higher probabilities than others regardless of the values of (ρ, α, β). The MS method sometimes detects more than one or no break; hence, we only compute its probability of correctly estimating k0 under the condition that it detects a single break. The probabilities of a correct estimation of the QML method increase with the sample sizes N and T in Tables 5, 6, and 8. Table 7 shows that the probabilities of correct estimation of the QML estimators increase as m decreases. A smaller m means that C is closer to a singular matrix. Table 7 is consistent with Table 3 , and confirms Theorems 1 and 3. To explore in more detail the effect of changes in m on the QML estimator, we vary the value of m using finer grids and find a similar pattern to that shown in Table 7 . The results are reported in the supplementary appendix. Figures 1 and 2 show the frequency of the estimated change points under DGP 1.A for N = 100, T = 100 and N = 500, T = 500 for 1000 replications. According to these figures, the QML estimators exhibit the highest frequency around the true break under different settings. When we increase the (N, T ) value from 100 to 500, the frequency at the true break point increases and the simulated distribution becomes tighter. This indicates that the QML estimators are highly likely to identify the true break point. This is consistent with our theory. However, the other three methods are found to have much larger variation and substantially lower probabilities to correctly estimate the break point. Thus, the QML estimators are advantageous in this case. Moreover, the simulation result indicates that for a sample size exceeding N = 5000, T = 1000, the probabilities of correctly estimating the QML estimator exceed 90%.

Recall that BKW and QML only have Op(1) estimation errors under DGP 1.B. However, Table 6 shows that in all cases, the probabilities of correct estimation by the QML estimator are much higher than those of correct estimation by the BKW 

In the first empirical application, we apply our proposed method to a U.S. macroeconomic dataset (Stock and Watson (2012) ) to detect the possible structural breaks in the underlying factor model. We use the dataset adopted by Cheng et al. equal to one and two for the pre-and post-break subsamples, respectively. Then, they implement the LS estimation and obtain the estimated break pointk = 2008 : 12. To implement our QML method, we first use Bai and Ng's information criterion IC1 and determine three pseudo-factors in the complete sample. Based on this result, we compute our QML estimator and obtain 2007:07 as the estimated break point, using which we split the sample into pre-and post-break subsamples. IC1 of Bai and Ng (2002) detects two pre-break and three post-break factors. From the numbers of pre-and post-break factors and that of pseudo-factors, one factor appears to emerge over time and the QML estimator is consistent based on Theorem 3.

The second empirical application uses the weekly rate of return for Nasdaq 100 Index from April 18, 2019, to October 1, 2020. As all companies have data starting from April 18, 2019, we choose that as the start date. Traditionally, the index is limited to 100 common-stock issues, with only one issue allowed per user. Now, the index is limited to 100 issuers, some of which may have multiple issues as index components. The current index has 103 components, representing 100 issuers, four of which are from China: Baidu, JD.com, Ctrip, and NetEase. Thus, the sample size is T = 76 and N = 103. As IC1 and IC2 of Bai and Ng (2002) , the methods proposed by Onatski (2010), Ahn and Horenstein (2013) , and Fan et al. (2019) yield different numbers of pseudo-factors for all samples, we use different number of factors r = 2, 3, 4, 5, 6, 7 to estimate the break date by using the QML method, and find that the estimated break date always falls in the week of February 20, 2020.

This result agrees with that obtained using the method developed by Baltagi et al. (2017) .

In fact, after receiving a briefing that the COVID-19 epidemic was about to spread, the Senate Intelligence Committee

Chairman Richard Burr sold 628, 000 to 1.72 million stocks in a one-day transaction on February 13, 2020. A week after this, that is, the week of February 20, 2020, the stock market began to fall sharply, and two weeks later, U.S. stocks halted for the first time. Thus, the factor loading matrix appears to have changed in the early days of the epidemic.

We study the QML method for estimating the break point in high-dimensional factor models with a single large structural change. We consider three types of change and develop an asymptotic theory for the QML estimator. We show that the QML estimator is consistent when the covariance matrices of the pre-or post-break factor loadings, or both, are singular. In addition, the estimation error of the QML estimator is Op(1) when there is a rotation type of change in the factor loading matrix, and thus, the covariance matrices of the pre-and post-break loadings are both nonsingular. The limiting distribution of the estimated break point can also be derived in this case. The simulation results validate the suitable performance of the QML estimator. We use the proposed method to estimate the break point for U.S. macroeconomic data and stocks data.

The estimated break date is July 2007 for the macroeconomic data and February 20, 2020, for the stocks data.

In model (3), 

For notational simplicity, letΣ 0 1 ≡Σ1(k0) andΣ 0 2 ≡Σ2(k0) . The QML objective function can be expressed as

If k = k0, the objective function is

The full-sample PCA estimatorĜ satisfies the following identity:

where H = Λ ′ ΛG ′Ĝ V −1 NT /N T and VNT is a diagonal matrix comprising the eigenvalues of XX ′ /N T . Hence, for each period t, we havê

Bai (2003) shows thatĝ

From (A.1) and Lemma A.2 in Bai (2003) , we have the following lemma:

(ii). Under Assumptions 1-9,

wherec > 0 is a constant.

Proof. See the supplementary appendix. ✷ Both Σ 1 and Σ 2 are positive definite matrices.

We first consider the case in which both Σ1 and Σ2 are positive definite matrices.

Following Baltagi et al. (2017) , we define 

H0 is nonsingular by Proposition 1 of Bai (2003) .

Before analyzing the consistency of the estimated fraction and the boundedness of the estimation error, we need to prove the following lemmas. For any given 0 < η ≤ min(τ0, 1 − τ0) and M > 0, define (1), By symmetry, it suffices to study the case of k < k0. Expanding UNT (k) − UNT (k0) gives

To proveτ − τ0 = op(1), we need to show that for any ε > 0 and η > 0, P (|τ − τ0| > η) < ε as (N, T ) → ∞, and that 0) ; thus, it suffices to show that for any given ε > 0 and η > 0, P ( min

Therefore, the following two events are equivalent:

Note that

Now, using (12) and (13), we have

where min (11) and Lemmas 2 (i), (iii), (v) and (vi). Note that (1) by Lemma 3. Thus, by (14) and (16), we can bound (15) by

. By the property of a characteristic polynomial, we have

where ρi(X) is the i-th eigenvalue of X for i = 1, · · · , r. On the partial derivative with respect to ρi(X), we have (X) .

From the derivative with respect to ρi(X), ∂g(X) ∂ρ i (X) < 0 for 0 < ρi(X) < 1 and ∂g(X) ∂ρ i (X) > 0 for ρi(X) > 1. Thus, for g(X) to achieve its minimum value, all eigenvalues of X must be one (i.e., all eigenvalues of the symmetric matrixΣ 0−1/2 2Σ 0 1Σ 0−1/2 2 should be equal to one); thus,Σ 0−1/2 2Σ 0 1Σ 0−1/2 2 = I andΣ 0 1 =Σ 0 2 . This implies that g(Ir) = 0 is a unique minimum of g(X). (1) under Assumption 1 and the fact that gt = Bft (1). As H →p H0 is a nonsingular matrix and B and C are nonsingular as well, Assumption 1(ii) impliesΣ

which has positive eigenvalues not equal to one. Thus, the sign of (18) implies that there exists a positive constant c ∆ such that min k∈D c ,k<k 0

with w.p.a.1 as N, T → ∞, where c ∆ is a constant related to the difference ∆ = BΣF B ′ − CΣF C ′ .

Thus, we obtain the result P (τ = τ0) = 1. ✷

To provek − k0 = Op(1), we need to show that for any ε > 0, there exists an M > 0 such that P (|k − k0| > M ) < ε as (N, T ) → ∞. By the consistency ofτ , for any ε > 0 and min{τ0, 1−τ0} > η > 0, P (k ∈ D c η ) < ε as (N, T ) → ∞. For the given η and M , we have Dη,M = {k : (τ0 −η)T ≤ k ≤ (τ0 +η)T, |k0 −k| > M }; thus, P (|k −k0| > M ) = P (k ∈ D c η )+P (k ∈ Dη,M ). Hence, it suffices to show that for any ε > 0 and η > 0, there exists an M > 0 such that P (k ∈ Dη,M ) < ε as (N, T ) → ∞.

Again, by symmetry, it suffices to study the case of k < k0. Similar to the proof of consistency ofτ , we have

Note that

where the op(1) term is uniform in k ∈ Dη,M and k < k0 by Lemma 4.

In addition,

where the third line uses (11) and the op(1) term in the last line is uniform in k ∈ Dη,M according to Lemmas 2 (i), (v), and (vi).

Let υ k denote a uniform op(1) term in (22). For any given δ > 0, (22) implies

Let m = k0 − k,

where 0 < C < ∞ is a constant.

Similarly, (21) implies

where the fourth line follows from max (1) by Lemma 2 (ii) and the fact that (1). In addition, the last inequality holds through a similar derivation used in (23) and (24). By the continuity of g defined in (17)

holds for a sufficiently large M by (24) and

w.p.a.1 as N, T → ∞. In addition, by (25), we have P ( min

as M → ∞. Using (26) and (27), we can obtain

Let us recall (12),

For the second term in the above equation, we have

by Lemma 5. Similarly, by (11) and Lemma 2, we have

Thus,

Similarly, for the case of k > k0, the limit can be written as

Before proving the theorem, we need to prove the following lemmas, where A − denotes the MP inverse of A, ρi(A)

represents the i-th eigenvalue of matrix A, and σi(A) represents the i-th singular value of matrix A.

Proof : By symmetry, it is sufficient to focus on the case of k ∈ [k0, [τ2T ] ].

Recall that E(e ′ t es)/N = γN (s, t). Consider the equation

where the first term can be written as

under Assumption 8(i) and the second term is Op(T −2 ) because the expectation can be bounded by

where we use the facts that E( gt gu ) ≤ [E gt 2 E gu 2 ] 1/2 ≤ maxt E gt 2 under Assumptions 1 and T t=1 |γN (s, t)| ≤ M by Assumption 3(ii).

Next, consider the term

where the first term can be bounded by

by Assumption 3(v) and the second term is bounded by

Combining the results obtained in (28)-(31), we obtain the desired result. ✷ When C is singular,Σ2(k) converges in probability to a singular matrix for k ≥ k0. In finite samples, however, the smallest eigenvalue ofΣ2(k) is not zero. The following lemma establishes a lower bound for the smallest eigenvalue ofΣ2(k), which ensures that it is meaningful to compute the logarithm of |Σ2(k)| in the objective function for any given sample size.

Symmetrically, a similar lower bound can be established for the smallest eigenvalue ofΣ1(k) for k ≤ k0 when B is singular.

Because of space restrictions, Proposition 1 here only states the result for the case ofΣ2(k).

for j = r2 + 1, ..., r.

Proof :

In addition, note thatΣ ρj(Σ2(k)) ≥ min

Thus, it suffices to find the lower bound for min k∈

Now, using (32), we can obtain

Let us consider the term a 1k in (35). The term a 2k in (35) (35), we can obtain its upper bound as

For the first term in (37), we have

by Lemma 6.

For the second term in (37), we can obtain

To observe this, note that k0, [τ2T ] ] following the derivation of terms I and II in Lemma B.2 of Bai (2003) .

Thus, combining the results in (37)-(39), we have

Next, rearranging the terms in (35) 

Part 2. Note that

In addition, σj

as N, T → ∞ for j = r2 + 1, ..., r and some positive constant cU .✷

The following lemma yields a bound on the difference between |Σ 0 2 | and |Σ2(k)| for k0 < k ≤ τ2T when C is singular. The same result applies to the difference between |Σ 0 1 | and |Σ1(k)| for τ1T ≤ k < k0 when B is singular. Proof :

First, note thatΣ thus,

where

by (47) andĜ

Thus, we have

where we use the fact that A # (1) uniformly over k0 < k ≤ τ2T . The first term in (49) is Op(N −(r−r 2 ) T −1 ) because |A k | = Op(N −(r−r 2 ) ) by proposition 1, the (48) and Lemma 1, (A 1/2 k ) # is uniformly Op(N −(r−r 2 −1)/2 ) by (46), and the last term in (49) is Op(δ −2 NT N −(r−r 2 −1) T −1 ) by (46), (48), and Lemma 1. The result in (49) indicates that

under the condition N ∝ T . Recalling (45), we obtain the same rate of S1 for both r2 = r − 1 and r2 ≤ r − 2.

Term S2 in (43) is zero if r2 ≤ r − 2 because the adjoint matrix of an r × r matrix A is zero when rank(A) ≤ r − 2.

where we use the fact that max k 0 <k<τ 2 T |Σ2(k)| = Op(N −(r−r 2 ) ) by proposition 1. As both (53) and (54) are shown to be Op(T −1 N −(r−r 2 ) ), we obtain the desired result for this lemma under the condition T ∝ N . ✷

The following lemma yields a lower bound on the difference between |Σ2(k)| and |Σ 0 2 | for τ1T ≤ k < k0 when C is singular and B is either singular or nonsingular. The same result applies to the difference between |Σ1(k)| and |Σ 0 1 | for k0 < k ≤ τ2T when B is singular.

for a constant c > 0 as N, T → ∞.

Proof :

Let us rewriteΣ2(k) asΣ 2(k)

by (44). Hence, we have

given the fact that the singular values are absolute values of the eigenvalues of a symmetric matrix.

Next, for the term P2 in (56), we havê

where G ≡ [g k+1 , ..., g k 0 ] ′ , and the first line uses the fact that gt = Cft for t ≥ k0.

by

Rearranging the inequality in (60) and using (59), we obtain

where the first line is based on the fact that H # H = Ir|H|, the second line uses the inequality that the maximum singular value is bounded by the Frobenius norm, and the third line follows from the derivation below:

given the fact that max

Now, it suffices to find the lower bound of ρ1(GC ′ # Q k C # G ′ ). Using the definition of Q k and inequality ρ1(AB) ≥ ρr(A)ρ1(B) for r × r positive semidefinite matrices A and B, we have

where we use the facts thatΣ 0 f,2 ≡ 1

for a constant c1 > 0 as N, T → ∞, because C # B = 0 and Σ f is positive definite according to Assumption 11 (i). As

For k0 − k being bounded,

for a constant c > 0 according to Assumption 11 (ii), where C # Bf k 0 = 0. Combining (62), (63), and (64), we have

p.a.1 for a constant c2 > 0. Thus, we can obtain the lower bound for the RHS of (61) as

as N, T → ∞ for a constant c3 > 0, because H has a nonsingular limit.

Based on (56) and Weyl's inequality, we have

for a constant c4 > 0 as N, T → ∞ by (58) and (65), where the last line follows from proposition 1 and the Op(T −1 ) term is uniform over τ1T ≤ k < k0.

Step 2. When r − r2 ≥ 2 or r2 = 0, the term P2 in (56) is zero. For the term P1 in (56), we havê

Using similar techniques to those in (60) and (61), we have

where the last line is based on the fact that

uniformly over τ1T ≤ k < k0 under the condition N ∝ T , because of Lemma 1, and the fact that ρ1(D # k ) = |D k |/ρr(D k ) = Op(N −(r−r 2 ) )Op(N ) and

where we set ft = Σ

1/2 f εt with E(εtε ′ t ) = Ir and E ≡ [ε k+1 , ..., ε k 0 ] ′ . First, we consider the case in which k0 −k is bounded. Note that D # k has r2 eigenvalues of order Op(N −(r−r 2 ) ) and r −r2 eigenvalues of order Op(N −(r−r 2 −1) ) by (68), and let v1(r × r2) and v2(r × (r − r2)) denote the corresponding eigenvectors.

By proposition 1, for t ≤ k0,

which implies that D 1/2 k U1εt lies in the space spanned by v1. Thus, for t ≤ k0,

where the second line follows from (71) and the last line follows from (68) and (69) 

where Proj(A|Z) denotes the projection of A onto the columns of Z, the second line follows from (69), and the Op(N −(r−r 2 )/2 ) term in the third line follows from (68) (73) is also bounded away from zero and lies in the space spanned by v2, because it is, by design, orthogonal to D 1/2 k U1, which lies in the space of v1 by (71). As v2 corresponds to the Op(N −(r−r 2 −1) ) eigenvalues of D # k , we have

w.p.a.1 as N, T → ∞ by proposition 1. Thus, for k0 − k being bounded,

w.p.a.1. for a constant c1 > 0 as N, T → ∞ under the condition N ∝ T .

Second, we consider the case in which k0 − k → ∞ as N, T → ∞. Using (70), we rewrite GHD # k H ′ G ′ as

Based on the same techniques as in (60) and (61), the decomposition in (70) implies

where the second inequality is based on the fact that σ1(AB) ≤ σ1(A)σ1(B) and the last line follows from the facts that |D k | is uniformly Op(N −(r−r 2 ) ) by proposition 1, σ1(U1) ≤ U1 = Op(1) by the structure of U1, and (1) uniformly over τ1T ≤ k < k0.

Next, we need to determine the lower bound of ρ1((GH − E U1D

where F ′ ≡ [f k+1 , ..., f k 0 ] = Σ 1/2 f E ′ in the second line. Again, using the inequality in (60), we have

Thus, in combination with (76), we obtain (75) and (77) yields

w.p.a.1 for a constant c3 > 0 as N, T → ∞.

According to (67), (74), and (78), there exists a constant c4 > 0 such that 

as N, T → ∞. Summarizing the results in (66) and (79), we obtain the lower bound of ρ1 Finally, using the lower bound of the largest eigenvalue of matrix 1 T −kĜ D −1 kĜ ′ , we can rewrite (55) as

as N, T → ∞.

Comparing |D k | and |Σ 0 2 |, we have rewritten as

log ρi(Σ 0 1Σ 0−1 2 ) →+∞ at the rate log T +Op(1).

When Σ2 is singular and Σ1 is singular or nonsingular, we can bound (83) by ) → +∞ at the rate log T . In addition, when ρ1(Σ1) > 0, we have ρ1(Σ 0 1 ) →p ρ1(Σ1) > 0; thus, r log ρ1(Σ 0 1 ) = Op(1). When ρ1(Σ1) = 0 (i.e., r1 = 0), we have ρ1(Σ 0 1 ) · T ≥ c > 0 w.p.a.1 for c > 0 by proposition 1; thus, −r log ρ1(Σ 0 1 ) → +∞ at the rate log T . For ρr(Σ2), rearranging the terms in (84) yieldsΣ 

Eigenvalue ratio test for the number of factors

Panel data models with multiple time-varying individual effects

The estimation of the variances in a variance-components model

Estimation Of A Change Point In Multiple Regression Models

Estimating and testing linear models with multiple structural change

Vector autoregressive models with structural changes in regression coefficients and variance-covariance matrices

Determining the number of factors in approximate factor models

Inferential theory for factor models of large dimensions

Common breaks in means and variances for panel data

Structural changes in high dimensional factor models

Estimation and inference of change points in high-dimensional factor models

Identification and estimation of a large factor model with structural instability

Estimating and testing high dimensional factor models with multiple structural changes

Simultaneous multiple change-point andfactor analysis for high-dimensional time series

Consistent factor estimation in dynamic factor models with structural instability

Testing for structural breaks in dynamic factor models

Estimating the common break date in large factor models

Detecting big structural breaks in large factor models

Shrinkage estimation of high-Dimensional factor models with structural instabilities

Estimating number of factors by adjusted eigenvaluesthresholding

Tests for parameter instability in dynamic factor models

Estimating a common deterministic time trend break in large panels with cross sectional dependence

Numerical Analysis for Statisticians

Estimation of large dimensional factor models with an unknown number of breaks

Dynamic Sparse Factor Analysis

Determinging the number of factors from empirical distribution of eigenvalues. The review of

Estimating and testing structural changes in multivariate regressions

Forecasting in dynamic factor models subject to structural instability

Disentangling the Channels of the 2007-09 Recession

..,ĝ k ] ′ . By (7.10) of Lange (2010) , we havewhere A −1 k is reasonable because the smallest eigenvalue of N · A k is bounded away from zero by proposition 1. We now analyze the term (T − k0) −1Ĝ A −1 kĜ ′ , which can be written asFor S1,by a uniform version of Lemmas B2 and B3 of Bai (2003) . When r − r2 = 1, (44) impliesThen, it follows thatgiven the fact that 1/|A k | is uniformly Op(T ) when r − r2 = 1 by proposition 1 under the condition N ∝ T and the fact that (1) by Lemma 1.uniformly over k0 < k ≤ τ2T by proposition 1; thus, we havewhere the second term in the third line is zero because, and the last equality follows from Lemma 1 and the result that 1/|A k | = Op(N ) by proposition 1 for r = r2 + 1. Hence, under theThus, combining the results in (43), (45), (50), and (51), we obtainThus, (42) can be written aswhere we use (52) and the fact that ρ1(ĜA −1where the Op(T −1 N −(r−r 2 ) ) term is uniform over k0 < k ≤ τ2T .Next, comparing |A k | and |Σ2(k)|, we haveWe would like to find the lower bound of the largest eigenvalue of matrix 1 T −kĜ D −1 kĜ ′ , which can be written asThe subsequent proof will be performed in two steps.Step 1.for some positive constant c5 > 0. In addition,Thus,as N, T → ∞, which implies the desired result under the condition N ∝ T . ✷We first prove the consistency ofτ , thenk − k0 = Op(1), and finally,k − k0 = op(1). Again, it suffices to study the case of k < k0.To proveτ − τ0 = op (1), we need to show that for any ε > 0 and η > 0, P (|τ − τ0| > η) < ε as N, T → ∞. For any given 0 < η ≤ min(τ0, 1 − τ0), define Dη = {k : (τ0 − η)T ≤ k ≤ (τ0 + η)T } and D c η as the complement of Dη. Similar to the proof for the consistency ofτ when B and C are nonsingular, we need to show that P (k ∈ D c η ) < ε. Recalling (13) and (15), we have(1). Consider the first term k k 0 −k log |Σ1Σ 0−1 1 |. When Σ1 is of full rank, it follows thatby the argument used in (15) and Lemma 2. When Σ1 is singular, we can obtainwhere the second line follows from Lemma 7 and the third line is based on the fact that |k0 − k| > ηT .(2). For the second and third terms, letWe show that f (Σ 0 1 ,Σ 0 2 ) → +∞ at the rate log T . When Σ1 is singular and Σ2 is a positive definite matrix, (10) implieŝwhere the op(1) term is uniform over k ∈ D c η for k < k0 by Lemma 2 (iv) and (vii). Together withΣ 0−1 2 →p Σ −1 2 > 0, (84) implies that ρi(Σ2Σ 0−1 2 ) is uniformly Op(1) and bounded away from zero; thus, (1) uniformly over k ∈ D c η for k < k0. In addition, we have ρi(Σ 0 (1) uniformly over k ∈ D c η for k < k0. Combining the above results, we establish the following result: f (Σ 0 1 ,Σ 0 2 ) → +∞ at the rate log T . Together with (81) and (82), we havew.p.a.1; thus, P ( min k∈D c η ,k<k 0 U (k) − U (k0) ≤ 0) → 0 for any η > 0, and hence,τ →p τ .Next, we show thatk − k0 = Op(1).Similar to the proof of Theorem 1, for given η and M , defineHence, it suffices to show that for any ε > 0 and η > 0, there exists an M > 0 such that P (k ∈ Dη,M ) < ε as (N, T ) → ∞. Similar to (20) and (80), it suffices to show that for any given ε > 0 and η > 0, there exists an M > 0 such thatFor the term k k 0 −k log |Σ1Σ 0−1 1 |, when Σ1 is of full rank, we have P ( minwhere the second equation holds because of Lemma 7 and the last equality holds because kFor the second and third terms, we consider several cases.(i). When Σ1 is singular and Σ2 is positive definite, we havewhere the second line is based on the fact thatΣ2 = 1 (22) and (24), and the divergence rate in the last line follows from the fact that − log |Σ 0 1 | ≥ log(c1T ) for some c1 > 0 by proposition 1 under the assumption N ∝ T and the fact that log |Σ 0 2 | = Op(1) becauseΣ 0 2 →p Σ2 is positive definite.(ii). When Σ2 is singular and Σ1 is either singular or positive definite, we havewhere the inequality in the third line holds because of Lemma 8, the Op(log T ) term in the third line follows from proposition 1, and the divergence in the last line evidently holds when k0 − k → ∞ and (k0 − k)/T → 0, becauseThus, we have shown that the second and third terms dominate the first term, and hence, (85) holds.To indicate the consistency ofk, we will show that for any k < k0 and k0 − k ≤ M , the objective function V (k) = When Σ1 is of full rank, the first term in (89) isSimilar to (86), when Σ1 is singular, the first term in (89) The second and third terms are discussed below.(i). When Σ1 is a singular matrix and Σ2 is a positive matrix,where T −k k 0 −k log |Σ2Σ 0−1 2 | = Op(1) is similar to (87).(ii). When Σ2 is singular and Σ1 is either singular or positive definite, similar to (88) In summary, we can determine U (k) → ∞ when k < k0 and k0 − k < M as N, T → ∞. Thus, we prove the consistency ofk. ✷