key: cord-0542560-i6b0jo86
authors: Cheng, Cheuk Hin; Chan, Kin Wai
title: A General Framework For Constructing Locally Self-Normalized Multiple-Change-Point Tests
date: 2022-04-30
journal: nan
DOI: nan
sha: bd34db34ead4d19ee591ae0ea88cbe1576e2623d
doc_id: 542560
cord_uid: i6b0jo86

We propose a general framework to construct self-normalized multiple-change-point tests with time series data. The only building block is a user-specified one-change-point detecting statistic, which covers a wide class of popular methods, including cumulative sum process, outlier-robust rank statistics and order statistics. Neither robust and consistent estimation of nuisance parameters, selection of bandwidth parameters, nor pre-specification of the number of change points is required. The finite-sample performance shows that our proposal is size-accurate, robust against misspecification of the alternative hypothesis, and more powerful than existing methods. Case studies of NASDAQ option volume and Shanghai-Hong Kong Stock Connect turnover are provided.

Testing structural stability is of paramount importance in statistical inference. There is considerable amount of literature and practical interest of testing existence of change points (CPs). Horváth (1988, 1997) provided comprehensively both classical parametric and non-parametric approaches for the at most one change (AMOC) problem.

For detection of multiple CPs, one method is to assume a fixed or bounded number of CPs (m). The test statistics are constructed by dividing the data into segments and maximizing a sum of AMOC-type statistics computed within each segment; see Antoch and Jarušková (2013) and Shao and Zhang (2010) . Since the true number of CPs (M ) is usually unknown in practice, several information criteria (Yao, 1988; Lee, 1995) have been proposed for estimating M . We remark that such approaches are vulnerable to misspecification of M and the computational cost is often scaled exponentially with m. Another approach relies on sequential estimation via binary segmentation (Vostrikova, 1981; Bai and Perron, 1998) .

This class of methods sequentially performs AMOC-type tests and produces CP estimates if the no-CP null hypothesis is rejected. The same procedure is repeated on each of the two subsamples split at the estimated CP until no segment can be split further. Since the structural difference between data before and after a particular CP may be weakened in the presence of other CPs, the AMOC-type tests may inevitably lose power under the multiple-CP situation.

To improve power, localization can be used. Bauer and Hackl (1980) introduced the moving sum (MOSUM) approach. Chu et al. (1995) and Eichinger and Kirch (2018) later applied MOSUM to CPs detection. This method selects a local window size and performs inference on each local subsample. To avoid selection of the window size, Fryzlewicz (2014) proposed wild binary segmentation, which randomly draws subsamples without fixing the window size. Both algorithms attempt to find a subsample which contains only one CP to boost power without specifying m. However, the above methods require consistent and robust estimation of nuisance parameters. For example, the long-run variance (LRV) σ 2 of the partial sum usually appears in a test statistic in the from of T n (σ 2 ) = L n /σ 2 , where L n satisfies L n d → σ 2 L under the null for some pivotal distribution L, and " d →" denotes convergence in distribution. Unfortunately, estimating σ 2 can be non-trivial and may require tuning a bandwidth parameter (Andrews, 1991) . Worse still, Kiefer and Vogelsang (2005) found that different selected bandwidths can noticeably influence the tests.

To avoid the challenging estimation of σ 2 , Lobato (2001) first proposed to use a selfnormalizer (SN) V n that satisfies V n d → σ 2 V for some pivotal V so that V n cancels out the nuisance σ 2 in L n to form a self-normalized test statistic T n = L n /V n d → L/V under the null. Shao and Zhang (2010) later modified the SN for handling the AMOC problem. In the multiple-CP problem, Zhang and Lavitas (2018) proposed a self-normalized test which intuitively scans for the first and last CPs. However, it may sacrifice power when larger changes take place in the middle. Moreover, applying their proposed test to CP location estimation is non-trivial.

In this paper, we propose a general framework combining self-normalization and localization to construct a powerful multiple-CP test, which is applicable to a broad class of time series models including both short-range and long-range dependent data. Our method also allows utilization of outlier-robust change detecting statistics to improve size accuracy and power. The proposed method is driven by principles which generalize one-CP test statistic to a multiple-CP test statistic. Our proposed test is shown to be locally powerful asymptotically. Recursive formula is derived for fast computation.

The remaining part of the paper is structured as follows. We first review the selfnormalization idea proposed by Shao and Zhang (2010) under the AMOC problem in Section 2. Our proposal is demonstrated by using the CUSUM process as motivation in Section 3. Limiting distribution, consistency and local power analysis are also provided. In Section 4, detailed principles and the most general form of our proposed test are presented with examples of applying rank and order statistics. The extensions to test for structural changes of other parameters, long range dependent and multivariate time series are also provided. In Section 5, the differences between different self-normalized approaches will be comprehensively discussed. Some implantation issues are also discussed. Simulation experiments are presented in Section 6, which shows promising size accuracy and substantial improvement of power over existing methods. The paper ends with real-data analysis of the NASDAQ option volume in Section 7.1 and the Shanghai-Hong Kong Stock Connect turnover in Section 7.2. Proofs of main results, additional simulation results, critical values, recursive formulas and algorithms can be found in the Appendix.

Consider the signal-plus-noise model: X i = µ i + Z i , where µ i = E(X i ) for i = 1, . . . , n and {Z i : i ∈ Z} is a stationary noise sequence. The AMOC testing problem is formulated as follows:

H 0 : µ 1 = · · · = µ n ,

(2.1)

H 1 : ∃ 0 < k 1 < n, µ 1 = · · · = µ k 1 = µ k 1 +1 = · · · = µ n .

(2.2)

Let ξ k 1 ,k 2 = k 2 i=k 1 X i if 1 ≤ k 1 ≤ k 2 ≤ n; ξ k 1 ,k 2 = 0 otherwise. Define the CUSUM process

where nt is the largest integer part of nt. The limiting distribution of functional of (2.3) relies on the following assumption.

Assumption 2.1. As n → ∞, {n −1/2 nt i=1 Z i : t ∈ [0, 1]} ⇒ {σB(t) : t ∈ [0, 1]}, where σ 2 = lim n→∞ Var(ξ 1,n )/n ∈ (0, ∞) is the long-run variance (LRV), B(·) is the standard Brownian motion and "⇒" denotes convergence in distribution in the Skorokhod space (Billingsley, 1999) . Assumption 2.1 is known as a functional central limit theorem (FCLT) or Donsker's invariance principle. Under standard regularity conditions (RCs), Assumption 2.1 is satisfied.

For example, Herrndorf (1984) proved the FCLT for the dependent data under some mixing conditions; Wu (2007) Classically, the celebrated Kolmogorov-Smirnov (KS) test statistic, defined as KS n (σ) = sup k=1,...,n |C n (k)/σ|, can be used for the AMOC problem. Since σ is typically unknown, we need to estimate it by a consistent σ no matter under H 0 or H 1 ; see Chan (2022) and Chan (2022+) for some possible estimators. Hence, KS n ( σ) d → KS := sup t∈(0,1) |B(t) − tB(1)|, which is known as the Kolmogorov distribution. Shao and Zhang (2010) proposed to bypass the estimation of σ by normalizing C n (k) by a non-degenerate standardizing random process called a self normalizer:

, k = 1, . . . , n − 1.

The resulting self-normalized test statistic for the AMOC problem is S (1) n , where S (1) n = sup k=1,...,n−1 S (1) n (k) and S (1) n (k) = C 2 n (k)/V n (k).

(2.4) Under Assumption 2.1 and H 0 , the limiting distribution of S (1) n is non-degenerate and pivotal. The nuisance parameter σ 2 is asymptotically cancelled out in the numerator C 2 n (k) and the denominator V n (k) on the right hand side of (2.4). Moreover, since there is no CP in the intervals [1, k 1 ] and [k 1 + 1, n] under H 1 , the SN at the true CP, V n (k 1 ), is invariant to the change and therefore their proposed test does not suffer from the well-known nonmonotonic power problem; see, e.g., Vogelsang (1999) . However, this appealing feature no longer exists in the multiple-CP setting. Zhang and Lavitas (2018) improved by proposing a self-normalized multiple-CP test without specifying m. The test utilizes forward and backward scans to select two time intervals, [1, k 1 ] and [k 2 , n] for 1 < k 1 , k 2 < n, and aims at detecting the largest change. Applying the SN in Shao and Zhang (2010) to each of these two subsamples, their test was proved to be consistent. However, the method tends to consider only the first and the last CPs. Thus, it affects power as will be shown in our simulation studies in Section 6. In the next section, we propose a framework for extending a one-CP test to a valid multiple-CP self-normalized test. It is achieved by utilizing the localization idea.

We consider the multiple-CP testing problem, i.e., to test the H 0 in (2.1) against H ≥1 : µ 1 = · · · = µ k 1 = µ k 1 +1 = · · · = µ k 2 = · · · = µ k M +1 = · · · = µ n , (3.1)

for an unknown number of CPs M ≥ 1, and unknown times 1 < k 1 < · · · < k M < n. Also denote π j = k j /n and ∆ j = µ k j +1 − µ k j be the jth relative CP time and the corresponding mean change, respectively, for j = 1, . . . , M . Since M is usually unknown and possibly larger than one in practice, the multiple-change alternative (3.1) is more sensible than the one-change alternative (2.2). To improve the power of (2.3), we generalize the CUSUM process to a localized CUSUM statistic. Define it and its SN as respectively, where 1 ≤ s ≤ k ≤ e ≤ n. The indices s and e denote the starting and ending times of a local subsample {X s , . . . , X e }, respectively. The statistic L (C) n (k | s, e) is used to detect a local difference between {X s , . . . , X k } and {X k+1 , . . . , X e }. Indeed, (3.2) generalizes the global CUSUM process (2.3) because C n (k) ≡ L (C) n (k | 1, n). Moreover, according to (3.2), the localized CUSUM process allows fast recursive computation as it is a function of C n (·). Consequently, the locally self-normalized (LSN) statistic is defined as

,

and T (C) n (k | s, e) = 0 otherwise; see Remark 3.1 for a non-SN approach. The LSN statistic generalizes Shao and Zhang (2010) 's proposal in the sense that S (1) n (k) ≡ T (C) n (k | 1, n). To infer how likely k is a potential CP location, we consider all the symmetric local windows The score function S (1) n (k) employed by Shao and Zhang (2010) . (c) The proposed score function T (C) n (k).

that center at k. The resulting score function is

where 0 < < 1/2 is a fixed local tuning parameter, which is similarly used in, e.g., Huang et al. (2015) and Zhang and Lavitas (2018) ; see Remark 3.2 for discussing the choice of symmetric windows. In particular, they suggested to choose = 0.1. This suggestion is followed throughout this article. In essence, (3.4) compares the local subsamples of length d+1 before and after time k, i.e., S before = {X k−d , . . . , X k } and S after = {X k+1 , . . . , X k+1+d }, for each possible width d that is not too small. The statistic T (C) n (k) records the change corresponding to the most obvious d. We visualize our proposed score function T (C) n (·) and Shao and Zhang (2010) proposal S (1) n (·) defined in (2.4) in Figure 3 .1 when they are applied to a time series with 5 CPs. Zhang and Lavitas (2018) 's method is not included because it does not map a time k to a score. Clearly, T (C) n (·) attains the local maxima near all true CP locations, while S (1) n (·) cannot. Possible reasons are reduced CP detection capacity of global CUSUM by other changes, non-monotonic CP configuration, and non-robust SN.

By design, the score function can be extended to CP estimation; see Remark 3.3.

Finally, aggregating the scores T (C) n (k) across k, we obtain the proposed test statistic:

which captures the effects of all potential CPs instead of merely the first and the last CPs as in Zhang and Lavitas (2018) . Hence, this statistic is potentially more powerful for testing multiple CPs than the existing state-of-the-art SN test (Zhang and Lavitas, 2018) .

Define 0 < π 1 < · · · < π M < 1 to be M ≥ 1 relative CP times so the CP occurs at times nπ 1 , . . . , nπ M . For convenience, denote π 0 = 0 and π M +1 = 1. Theorem 3.1 below states the limiting distribution of T (C) n and the consistency of the test.

Theorem 3.1 (Limiting distribution and consistency). Suppose Assumption 2.1 is satis-

. . , M } and C > 0 such that the jth CP time satisfies < min(π j −π j−1 , π j+1 −π j ) and the jth change magnitude satisfies |∆ j | = |µ k j +1 −µ k j | > C as n → ∞, then lim n→∞ T (C) n = ∞ in probability.

Under H 0 , the test statistic T (C) n has a pivotal limiting distribution, whose quantiles can be easily found by simulation; see Table D Theorem 3.2 (Local limiting power). Under H ≥1 , parametrize the jth change magnitude |∆ j (n)| = |µ k j +1 − µ k j | as a function of n. If there exist j ∈ {1, . . . , M } and 0.5 < κ ≤ 1 such that the jth CP satisfies < min(π j − π j−1 , π j+1 − π j ) and |∆ j (n)| n κ−1 as n → ∞, then lim n→∞ T (C) n = ∞ in probability.

The smallest change magnitude so that the test remains powerful satisfies κ > 0.5. This result is in line with Zhang and Lavitas (2018) . Indeed, from the proof of Theorem 3.2, it is easy to see that CP tests based on the CUSUM process are of power one asymptotically only if the order of change magnitude is larger than n −1/2 . Therefore, our proposed test preserves the change detecting ability of the CUSUM process.

Remark 3.1 (Non-SN approach). Users may instead use a consistent estimator of the LRV, say σ 2 n , to replace the SN (3.3), V (C) n (k | k − d, k + 1 + d), for all k and d. The performance is expected to be affected by the robustness of CP estimators (if any) and bandwidth parameter as discussed in Kiefer and Vogelsang (2005) . Alternatively, the LRV can be estimated robustly by splitting the dataset as in (3.3). However, the estimator of σ 2 will become very noisy due to small subsample sizes. This extra variability in estimating σ 2 may accumulate and eventually ruin the test statistic. Our approach takes the variability

Remark 3.2 (Symmetric windows). Define the score function based on non-symmetric windows asT

The corresponding test statistic isT

n (k)/(n − 2 n − 1). To balance computational cost and power, we suggest using symmetric windows T (C) n (k) instead of T (C) n (k), and set the local subsamples before and after target time k, i.e., S before and S after , to have the same width d. The test consistency only requires existence of one CP in the local windows. The change can still be captured under symmetric windows without significant power loss. Using non-symmetric windows significantly increases computational cost and is not expected to bring vast improvement in power. Some simulation evidence is provided in Sections 5.1 and 6.4. Remark 3.3 (CP time estimation). The score function T (C) n (·) in (3.5) can be used for CP estimation; see Figure 3 .1 (c) for visualization. Motivated by the GOALS algorithm (Jiang et al., 2022+) , the set of estimated CP times can be obtained by finding the local maximizers of the score functions: (3.11) where is the decision threshold that is of order o(n). The estimation may be improved through integrating other screening algorithms to perform model selection in order to avoid overestimating the number of CPs. For example, the screening and ranking algorithm (SaRa) (Niu and Zhang, 2012) and the scanning procedure proposed by Yau and Zhao (2016) . Alternatively, it is possible to apply our proposed test to stepwise estimation, for example, binary segmentation (BS) (Vostrikova, 1981) , wild binary segmentation (Fryzlewicz, 2014), etc; see Section E in the appendix for the detail of integrating our proposed scanning procedure with SaRa and BS.

The proposed test statistic T (C) n in (3.6) sheds light on how multiple-CP tests should be performed. In particular, we combine the advantages of localization and self-normalization.

Although T (C) n demonstrates to have appealing features as we see in Figure 3 .1, it is not fully general in terms of two aspects to be stated below. In this section, we lay out the underlying principles to define a general class of statistics for testing multiple CPs.

n is a functional of the global CUSUM process C n (·). Thus, potential CPs are detected via differences between local averages of X i 's, which however may not be the best choice, in particular for the data that have heavy-tailed distributions. Generally, one may consider any specific global change detecting process {D n ( nt )} 0≤t≤1 for the AMOC problem. Applying the localization trick (3.2) and the self-normalization trick (3.3) to D n (·) instead of C n (·), we obtain the generalized LSN statistic T (D)

If D n (·) converges weakly in Skorokhod space to σ D D(·) for some σ D ∈ (0, ∞), where D(·)

is an empirical process whose distribution is pivotal, then T (D) n (k | d) is asymptotically pivotal as in Theorem 3.1. Examples of D n (·), based on Wilcoxon statistics, Hodges-Lehmann statistics, and influence functions, are provided in Examples 4.1-4.3. Extensions to long-range dependent and multivariate time series are provided in Examples 4.4 and 4.5, respectively.

Second, there are two free indices, d and k, in

The maximum operator in (3.5) and the mean operator in (3.6) are used to aggregate the indices d and k, respectively. The introduction of d is to let the data choose the window sizes. We remark that the performance of the MOSUM statistic, which is one special case of localized CUSUM with a fixed window size, depends critically on the window size (Eichinger and Kirch, 2018) .

Hence, they suggested to consider multiple window sizes for robustness. We follow their suggestion to improve performance by aggregating all possible window sizes. Generally, users may choose their preferred aggregation operators, e.g., median, trimmed mean, etc, for aggregation. Formally, for any finite set of indices S ⊂ Z having size |S|, we say that P is an aggregation operator if P i∈S a i maps {a i ∈ R : i ∈ S} to a real number, e.g., if P = max, P i∈S a i = max i∈S a i gives the maximum; and if P = mean, P i∈S a i = i∈S a i /|S| gives the sample mean. Hence, the most general form of our proposed test statistic is

where P (1) and P (2) are some aggregation operators. In particular, T (C) n [max, mean] is used in (3.6). It is practically convenient to use weighted sum and weighted average, i.e.,

(1) n ≤d≤n a d = max n ≤d≤n w (1) d a d and P

(2) n <k<n− n a k = n <k<n− n w

(2) k a k n <k<n− n w

k 's are some constants specified by users. The weights guarantee the local intervals are not too small in order to avoid degeneracy of the SN. They are also used to reflect user's prior beliefs on k and d. We propose to use P (2) = mean instead of P (2) = max for better empirical performance. By construction, our proposed LSN statistics remain large over the neighborhoods of the true CPs. Thus, the averaged score capture evidence contributed by all CPs and their neighborhoods, whereas the maximum score can only capture evidence contributed by one single CP corresponding to the highest score. In this sense, P (2) = mean is more suitable for multiple-CP testing.

Our approach is more flexible as it supports a general CP detecting process D n (·) and general aggregation operators P (1) and P (2) , Moreover, our proposed LSN statistic is a function of the supplied global CP detecting process. Therefore, recursive computation of the LSN statistics, regardless of the supplied global change detecting process, is possible for our approach. It results in lower computational cost; see Section F in the appendix. In a nutshell, our framework is an automatic procedure for generalizing any one-CP detecting statistic D n (·) to a multiple-CP detecting statistic T (D) n . We demonstrate it via following examples.

Example 4.1 (Wilcoxon rank statistics). An outlier-robust non-parametric CP test can be constructed by using the Wilcoxon two-sample statistic. The corresponding global change detecting process is

is the rank of X i when there is no tie in X 1 , . . . , X n . Assuming that the data is weakly dependent and can be represented as a functional of an absolutely regular process. Under some mixing conditions, we have

is the distribution function of X 1 with a bounded density; see Theorem 3.1 in Dehling et al. (2015) . Using the principles in Section 4, we may use (4.1) to construct a self-normalized multiple-CP Wilcoxon test and eliminate the nuisance parameter σ W by using the test statistic T W n := T W n [max, mean], whose limiting distribution is stated as follows.

Corollary 4.1 (Limiting distribution of T W n ). Under the RCs in Theorem 3.1 of Dehling et al. (2015) and

Example 4.2 (Hodges-Lehmann order statistics). Hodges-Lehmann statistic is another popular alternative to the CUSUM statistic in (2.3). Its global change detecting process is 

f is the density of X 1 , and F is the distribution function of X 1 . Similarly, we can apply the Hodges-Lehmann statistic to test for multiple CPs. The test statistic is T (H) n = T (H) n [max, mean], whose limiting distribution is stated below. To our best knowledge, there is no existing selfnormalized test that uses the Hodges-Lehmann statistic. It is included for detecting CP in heavy-tailed data and showing the generality of our proposed self-normalization framework. 

i (x)} 2 are the marginal mean and variance at time i respectively. For h = 2, the lag-1 autocovariance at time i

The hypotheses in (2.1) and (3.1) are redefined by replacing µ i 's by θ i 's. A possible global change detecting process is

for t ∈ [0, 1]; see, e.g., Shao (2010) , Shao and Zhang (2010) and Shao (2015) . We remark that θ is allowed to be of dimension q > 1; see Example 4.5 for the construction of the corresponding global change detecting process. Clearly, if θ i,j = (j − i + 1) −1 j t=i X t , then G n (·) = C n (·). So, (4.3) generalizes (2.3) from testing changes in µ i 's to testing changes in 

s,e ) is asymptotically linear in the following sense, as e − s → ∞:

where R s,e is a remainder term,

e , and IF(y | P, F h ) = lim →0 {P ((1− )F h + δ y ) − P (F h )}/ is the influence function; see Section 2.3 of Wasserman (2006) . The asymptotic linear representation (4.4) is known as a nonparametric delta method.

Example 4.4 (Long-range dependent time series). Long-range dependence (LRD) is widely used for modeling time series data in, e.g., earth sciences, econometrics and traffic systems; see Palma (2007) for a review. The time series is of LRD if the autocovariances satisfy

(4.5)

as k → ∞, where D ∈ (0, 1) is the LRD parameter, φ(·) is a slowly varying function at infinity; see, e.g., Pipiras and Taqqu (2017) . In this section, we discuss two possible change detecting processes for handling LRD.

First, the global CUSUM process (2.3) with suitable normalization converges under the LRD setting. Specifically, by Theorem 5.1 in Taqqu (1975) , the scaled CUSUM process 

The resulting test statistic is T (W * ) 

{U i } is a stationary zero-mean standard Gaussian process and its autocaovariances satisfy (4.5), and h is a measurable strictly monotone function such that E{h(U i )} = 0. Under

for any 0 < < 1/2, where L H is defined as L in (3.9) but with B(·) being replaced by B H (·); and V H is defined as V in (3.9) but with L(· | ·, ·) being replaced by L H (· | ·, ·).

In practice, the unknown parameter H can be consistently estimated through regression in the spectral domain (Palma, 2007) by, e.g., the local Whittle method (Künsch, 1987) . Shao and Wu (2007) showed the consistency of the local Whittle estimator.

Example 4.5 (Multi-dimensional parameter). Our approach extends to the multivariate case naturally. Let {Q n ( nt )} 0≤t≤1 be a q-dimensional CP detecting process under the AMOC problem, where q ∈ N. For example for testing changes in q-dimensional parameter Θ, we can use a multivariate version of (4.3):

where Θ s,e is the estimator for Θ using the subsample {X s , . . . , X e }. We define the cor-

and A ⊗2 = AA T for any matrix A. Aggregating T (Q) n (k | s, e) as in (3.5) and (3.6), we obtain the test statistic T (Q) n := T (Q) n [max, mean]. Its limiting distribution is stated as follows. is q-dimensional standard Brownian motion. Under H 0 , we have T (Q) n d → T (q) for any

where L (q) is defined as L in (3.9) but with B(·) being replaced by B (q) (·); and V (q) is defined as V in (3.9) but with L(· | ·, ·) 2 being replaced by L (q) (· | ·, ·) ⊗2 . Shao and Zhang (2010) proposed a general AMOC self-normalized change point test for the mean of multivariate time series. Our approach localizes their proposed CP detecting process and self normalizer for multiple-CP detection. Note that their proposed SN requires repeated calculation of the quantities of interest for each subsample while our SN reuses the global change detecting process which allows faster computation. 5 Discussion and Implementation 5.1 Comparison with existing methods Shao and Zhang (2010) extended the self-normalized one-CP test (2.4) to a "supervised" multiple-CP test tailored for testing m CPs, where m is pre-specified. The test statistic is

where 1 = k 0 < k 1 < · · · < k m < k m+1 = n, and

The trimmed region Ω (m) n ( ) prevents computing estimates by too few observations. Zhang and Lavitas (2018) later proposed a "unsupervised" self-normalized multiple-CP test which does not require specifying m and defined as, This approach is also used by Antoch and Jarušková (2013) for constructing a non-selfnormalized multiple-CP test. It has a strict control on the m local windows since the boundaries of a window relate to previous and next windows. If the number of CPs M is misspecified, some of the windows may not contain only one CP. If m < M , the self normalizers are not robust to the changes. If m > M , some degree of freedom is wasted to detect CPs that do not exist. Both two cases may lead to a huge loss in power. Zhang and Lavitas (2018) 's approach sets the left end of the window to be 1 in the forward scan, and the right end to be n in the backward scan. So, their approach tends to scan for the first CP k 1 ∈ [1, e], and the last CP k M ∈ [s, n] for some e, s; see Section 6.4.

In contrast, our approach scans for all possible CPs. We also allow the windows to start and end at any times instead of restricting them to start at time 1 or end at time n.

Besides, Jiang et al. (2022+) recently considered a self-normalized CP estimation problem for a quantile regression model, while we consider a general CP testing problem. Since we solve different inference problems, the results are not directly comparable. So, we only compare the self normalizers in these two approaches. First, their self normalizer is specifically designed for quantile regression model, but ours is built on a general framework, which supports any global change detecting process; see Section 4. One of our major contributions is to extend any one-CP test to multiple-CP self normalized test. Second, computing their self normalizer requires fitting the quantile regression model O(n 2 ) times on all possible local subsamples. In contrast, our self normalizer is a function of a global change detecting process, which can be computed by fitting the models on O(n) local subsamples. It significantly reduces computational burden. Although we focus on different problems, our proposed test statistic can be integrated into the first step of their algorithm. The fused algorithm has a potential of further enhancing generality and computational flexibility. We leave it for further study.

Computational efficiency is usually a major concern in constructing multiple-CP tests. Figure 5 .1(b) shows that the computational cost of using nonsymmetric window inT (C) n is significantly larger than that of T (C) n because of the larger feasible set in computing the score function (3.10). However, the gain in power is insignificant; see Section 6.4. The incremental power improvement does not justify the huge additional time cost. Therefore, using symmetric windows is recommended. and (e) requirements of computing a LRV estimate σ 2 n and specifying a target number of CPs m. The comparisons in (a) and (b) are based on our simulation results in Section 6 and the appendix. 

In the time series context, the accuracy of the invariance principle (i.e., Assumption 2.1) may deteriorate when the sample size is small (Kiefer and Vogelsang, 2005) . Therefore, the asymptotic theory in, e.g., Theorem 3.1, may not kick in. It may lead to severe size distortion. To mitigate this problem, finite-sample adjusted critical values can be used.

We propose to compute a critical value c α (n, ρ) by matching the autocorrelation function (ACF) at lag one ρ and n for different specified level of significance α ∈ (0, 1). The values of c α (n, ρ) are tabulated in Section D of the appendix for α ∈ {0.01, 0.05, 0.1}, n ∈ {100, 200, . . . , 1000, 2000, 3000, . . . , 10000}, and ρ ∈ {0, ±0.1, . . . , ±, 0.9}. The testing procedure is outlined as follows.

2. Obtain the critical value c α (n, ρ). Use interpolation if necessary.

n > c α (n, ρ), where ℵ ∈ {C, W, H, G}.

The limiting critical value is approximated by using finite-n critical value with n = 10000.

Since ρ relies on the differenced data with a suitable lag, the change points (if any) have negligible effect on ρ. The consistency of ρ is developed on the following framework. Let

where µ i 's are deterministic and Z i 's are zero-mean stationary noises. Define

where ε i 's are independent and identical (iid) random variables and g is some measurable function. Let ε i be iid copy of ε i and Z i = g(. . . , ε −1 , ε 0 , ε 1 , . . . , ε i ).

For p > 1 and Z i p < ∞, the physical dependence measure (Wu, 2011) is defined as

In particular, if b = n 1/3 and |∆ * | < ∞, then Theorem 5.1 guarantees that ρ is consistent for ρ provided that the actual number of CPs satisfies M = o(n 2/3 ). This testing procedure does not assume any parametric structure as we only intend to match the sample ACF at lag one. Although it is possible to further match ACF at a higher lag, the proposal is to balance the accuracy and the computational cost. Simulation result shows that the tests have accurate size under the above critical value adjustment procedure. We emphasize that this adjustment only affects the finite-sample performance because the critical value c α (n, ρ) is a constant as a function of ρ when n → ∞.

6 Simulation experiments 6.1 Setting and overview Throughout Section 6, the experiments are designed as follows. The time series is generated from a signal-plus-noise model:

The values of µ i 's will be specified in Sections 6.2 and 6.3. The zero-mean noises Z i 's are simulated from a stationary bilinear autoregressive (BAR) model:

standard normal random variables, and , ϑ ∈ R such that 2 + ϑ 2 < 1. Similar conclusions are obtained under other noise models, including autoregressive-moving-average model, threshold AR model, and absolute value nonlinear AR model. Due to space constraint, their results are deferred to the appendix. The critical value is chosen according to the adjustment procedure described in Section 5.2.

In this subsection, we examine the size and power of different CP tests when there exist various numbers of CPs. Following the suggestion of Huang et al. (2015) and Zhang and Lavitas (2018), we choose = 0.1 in S (2) n , S (3) n , Z n and our proposals in order to have a fair comparison. Suppose that the CPs are evenly spaced and the mean change directions are alternating (increasing or decreasing):

where M ∈ {1, . . . , 6} denotes the number of CPs, and ∆ ∈ R controls the magnitude of the mean changes. Clearly, µ i ≡ 0 under H 0 .

All tests are computed at nominal size α = 5%. The null rejection rates α, for sample sizes n ∈ {200, 400}, are presented in Table 6 .1. To summarize the result, we further report the sample root mean squared error (RMSE) of α over all cases of ϑ and for each test and each n. The self-normalized tests generally have more accurate size than the non-selfnormalized test KS n . The finding agrees with Shao (2015) . The self-normalized approach is a special case of the fixed bandwidth approach with the largest possible bandwidth and thus achieves the smallest size distortion according to Kiefer and Vogelsang (2002) can be attributed to the outlier robustness achieved by rank and order statistics. This advantage is particularly obvious in bilinear time series as this model is well known to produces sudden high amplitude oscillations to mimic structures in, e.g., explosion and earthquake data in seismology; see Section 5.2 of Rao and Gabr (1984) .

However, in the more standard ARMA models, T (C) n performs as well as T (W) n and T (H) n ; see the appendix for the detailed results.

The power curve against ∆ is computed at 5% nominal size by using 2 10 replications with n = 200. Figure 6 .1 shows the size-adjusted power when = ϑ = 0.5. The (unadjusted) Table 6 .1: Null rejection rates α at nominal size α = 5% under BAR model and mean function (6.1).

The sample RMSE of α over all cases of ϑ and is computed for each test and each n. The values of α and RMSE are expressed in percentage. The smaller the RMSE, the better is the test in controlling the type-I error. 3.6 0.5 13.7 6.5 9.5 8. 8 8.1 2.1 4.5 3.9 10.0 6.0 6.6 6.1 6.4 2.1 3.9 3.5 0.3 11.0 5.1 5.5 4.9 5.6 3.0 5.0 4.6 7.6 4.8 5.7 4.1 5.2 2.7 4.5 4.1 0 9.4 3.9 3.8 2.0 2.7 3.8 6.5 6.1 6.6 4.8 4. if the boundaries of the local windows are off from the actual CPs. It is also interesting to see that, compared with S (1) n , the tests S (2) n and S (3) n are less sensitive to misspecification of M although they are still less powerful than our proposals.

This subsection investigates why our proposed tests have a higher power in a broader CP structures than the existing unsupervised multiple-CP test Z n . Consider the 3-CP setting with the magnitudes of changes at the first and the last CPs are only half of the second CP. Specifically, we consider two mean functions:

• (Case 1) equal magnitude: µ i = ∆ 1 {i> n/4 } − 1 {i> 2n/4 } + 1 {i> 3n/4 } ; and

• (Case 2) unequal magnitude: models is provided in the appendix. Compared to Case 1, Z n and S (2) n lose roughly half of the power in Case 2. In contrast, the power of our tests only reduces by 1/3 when ∆ ≤ 1 while remains about the same when ∆ ≥ 2. Therefore, our tests are more powerful and are more robust to mean change structures than Z n . It is because our proposals are able to capture all change structures whereas Z n only takes the first and the last CPs into account; see (5.2).

To balance computational cost and power, we propose to use symmetric windows when constructing the LSN statistic; see Remark 3.2. Intuitively, changes can still be detected by the contrast between samples of the same size. In this subsection, we study the power loss of T (C) n due to the use of symmetric windows. We consider three-CP alternative with changes even and unevenly spaced in order to compare T (C) n andT (C) n . Specifically, let the mean function with changes evenly spaced be Case 1 defined in Section 6.3, while the unevenly spaced CP model is defined as n . This indicates that using nonsymmetric windows does not bring significant advantage in power while it drastically increases computational burden; see Section 5.1. Therefore, we recommend symmetric windows to balance the power and the computational cost.

7 Real data analysis 7.1 NASDAQ call option volume

We perform CP analysis for the Bloomberg's NASDAQ call option volume index (Bloomberg code: OPCVNQC index) from 24 March 2017 to 21 March 2022 (n = 1259). The data are retrieved from Bloomberg 1 . Researchers and practitioners are interested in using the option volume to analyze the stock market. Pan and Poteshman (2006) investigated the market price adjustment and identified strong evidence of informed trading in the option market. Johnson and So (2012) found that option-to-stock-volume (O/S) ratio is significant in predicting the magnitude and sign of abnormal return resulting from earning surprises.

It is consistent to the finding that the option volume reflects private information. Further divided the option volume into categories, Ge et al. (2016) founded that higher O/S ratio in general predicts lower stock return since more components of option volume negatively predict return.

Since the option volume is a significant factor of the market analysis, CP analysis on the constant mean option volume is thus of direct interest. The test statistics and p-values are presented in Table 7 .1. Since the result of our proposed tests reveal strong evidence against the hypothesis that no CPs exist, we further conduct mean change point location estimation, defined in (3.11), of which the estimates are indicated by red vertical lines in January 2020, and 2 November 2020. The test using Wilcoxon statistics lead to similar results, while the test using Hodges-Lehmann statistics does not detect the first CP. The most dominant CP is the second CP at the end of January 2020. Studies have found a surge in trading activities worldwide and observed the trading from home effect (Chiah and Zhong, 2020; Ortmann et al., 2020) . Chiah and Zhong (2020) explained the sudden growth of trading activities could be attributed to some retail investors regarded the stock market as gambling substitute. The jump in trading volume may also result from changing growth expectation and reaction to the post COVID-19 stimulus policies from the investors the cointegration between the stock markets and concluded that volatility spillover effect is strengthened. The result from Chan et al. (2018) further supported that the Stock Connect turnover is significant to the market volatility. As an indicator of foreign investment, the Stock Connect turnover is important for market volatility modeling. Therefore, we study whether CP exists in the Stock Connect Southbound daily turnover.

We investigate whether changes in trend exist in the log daily turnovers. Under our framework, it suffices to define a global change detecting process. It is, therefore, straightforward for practitioners. We consider the global change detecting process defined in (4.3)

with two possible choices ofθ 1:k andθ k+1:n specified below.

are the intercept and slope of the mean trend at or before time k, respectively, whereas α + k , β + k ∈ R are corresponding values after time k. Thenθ 1:k andθ k+1:n are defined as the ordinary least-squares estimators of β − k and β + k , respectively.

are similarly interpreted as in the mean trend model. Thenθ 1:k andθ k+1:n are estimators of β − k and β + k , respectively; see the modified Barrodale and Roberts algorithm (Koenker and d'Orey, 1987) . The estimators are computed by the R package "quantreg". 

Our proposed method improves existing CP tests and brings several advantages: (i) it has high power and controls size well, (ii) neither specification of number of CPs nor consistent estimation of nuisance parameters is required, (iii) change point location estimate can be naturally produced, (iv) general change detecting statistics, e.g., rank and order statistics, are allowed to enhance robustness (v) it is capable to test change in general parameter of interest other than mean; and (vi) both short-range and long-range dependence are supported. Table 5 .1 summarizes the properties of the proposed tests. Besides, our proposal is driven by intuitive principles, so that as long as a one-change detecting statistic is provided, our proposal can generalize it to a multiple-change detecting statistic. We anticipate that our framework can be applied to non-time-series data, e.g., spatial data and spatial-temporal data, in future work. 

converges to a non-negative and non-degenerate distribution for any n ≤ d ≤ n and n + 1 ≤ k ≤ n − n − 1, provided that > 0.

(ii) For consistency, we consider the jth relative CP time π j such that the corresponding change magnitude satisfies |∆ j | n κ−1 , where 0.5 < κ ≤ 1. Denote ∆ * = ∆ j and k * = nπ j . Under the assumption, there exist c ∈ R + and 3/2 − κ ≤ Υ < 1 such that + cn Υ /n < min(π j − π j−1 , π j+1 − π j ) is satisfied for a large enough n. Therefore, there is only one CP in the interval k * ± ( n + cn Υ ). It suffices to consider d = n . For all

where |Ξ 1,n | n κ−1/2 . Next, for the SN V (C) n (k | k − d, k + d + 1), we consider two cases: k ≥ k * and k < k * . If k ≥ k * , there is only one CP in the interval [k − d, k]. Following similar calculations as in (A.1), we know that L (C)

. . , k, provided that n is large enough. Moreover, since there is no CP in the interval [k + 1, k + 1 + d], L (C) n (j | k + 1, k + 1 + d) 2 = O p (1) for all j = k + 1, . . . , k + 1 + d. Therefore,

where |Ξ 2,n | n Υ+κ−3/2 . For the case when k < k * , the analysis is similar and V (C) n are of the same order. From above, there exists a constant c = 0 such that

for a large enough n, where c > 0 is a constant. Consequently, T (C) n → ∞ in probability as n → ∞ because Υ < 1.

Proof of Theorem 5.1. 

where " pr →" denotes convergence in probability. Similarly, we have γ 

Since {µ t+b − µ t } 1≤t≤N has at most bM non-zero elements, we have

which converges to 0 in probability by the absolute summability of γ(k). Therefore, we have γ 

Consider the first term of (A.5), we have, by Theorems 7 and 8 in Wu (2011), that

By similar arguments as above, we also have Proof of Corollary 4.3. Using (4.4), we can rewrite G n (k) = k(n − k) n 3/2 θ 1:k − θ k+1:n = I n (t) − k n I n (1) + k(n − k) n 3/2 (R 1:k − R k+1:n ).

The last term vanishes uniformly over k as n → ∞ since sup k=1,...,n−1 |R 1:k | + |R ( Proof of Corollaries 4.4 and 4.5.

n is a continuous and measurable function of C * n (·) and W * n (·) respectively. The above arguments also apply and the limiting distribution is a functional of standard fractional Brownian motion B H (t) instead of the standard Brownian motion B(t). comparison between all the test concerned when the magnitudes of changes are not equal is reported in Section C.8. Last but not least, the power comparison after the use of nonsymmetric windows is presented in Section C.9. All results are obtained using 2 10 replications and the nominal size is 5%. The sample size is n = 200 unless specified. We define ε i to be independent standard normal random variables for i = 1, . . . , n. We again consider the signal-plus-noise model, X i = µ i + Z i , and let µ i to be the signal, specified in equation (6.1) from the main article, i.e.,

where M ∈ {1, . . . , 6} denotes the number of CPs, and ∆ ∈ R is the magnitude of the mean changes.

In this subsection, we investigate the performance under the AR(1) model. Specifically, the data has the form X i − µ i = (X i−1 − µ i−1 ) + ε i , where the AR parameter | | < 1. The null rejection rates are documented in Table C .1. Our proposed tests have accurate size in 

The ARMA model is defined as X i −µ i = (X i−1 −µ i−1 )+ε i +ϕε i−1 , where , ϕ ∈ (−1, 1).

The size accuracy is reported in Table C .2. The self-normalized approach generally has more accurate size than the non-self-normalized approach. Roughly speaking, KS n is seriously over-sized in most cases, while the self-normalized tests are under-sized in some cases. Overall, S (1) n has the most accurate size, but the size distortion of S (m) 

The threshold AR model is defined to be 1) . In this subsection, we report the performance under different choices of 1 and 2 . Table C 

The data under the NAR model has form

∈ (0, 1). From Table C 

Supplementing the result in the main text, we further report the power when ϑ = 0.8, −0.8. 

Our proposed tests in general control the size well. We also observed that non-selfnormalized KS test has larger size distortion than self-normalized tests. The results agree to Kiefer and Vogelsang (2002) and Shao (2015) . Our proposed tests also have larger power comparing to the state-of-art self-normalized multiple-CP tests, S (2) n , S (3) n and Z n . Under the BAR model, our proposed tests even outperform the AMOC tests, S (1) n and KS n , in the single CP case. Moreover, the oracle tests S (2) n and S (3) n have smaller power comparing to our proposed tests even when the number of CP is correctly specified. It demonstrates that knowing the exact number of CPs does not give a significant advantage in CP detection.

The finite-n adjusted critical values c α (n, ρ) for T (ℵ) n at size α ∈ {0.1, 0.05, 0.01} are presented in Tables 

The self-normalizer (SN) using symmetric window, given any input change detecting process {Y n (k)} 1≤k≤n are defined as,

L (Y) n (k | k + 1, k + 1 + d) 2 .

[1] Input: [2] (i) X 1 , . . . , X n -the data set;

[3] (ii) T (C) n (· | X) -the proposed score function when the a dataset X of size n is used; [4] (iv) p n (X) -the p-value function when the a dataset X of size n is used; [5] (iii) p 0 -the p-value threshold to decide whether the CPs exist in the sample; [6] (iii) n -minimum window size.

[7] Function getCP(s, e): [8] if p(X s , . . . , X e ) < p 0 and e − s + 1 ≥ n then [9] return the set of the most obvious CP location {arg max s≤k≤e T (C) e−s+1 (k | X s:e )}. [10] else [11] return an empty set {} [12] begin [13] Compute Π = Π = getCP(1, n). [14] while Π = ∅ do [15] Π 0 = ∅ [16] foreach k ∈ Π do [17] Set k − = max{ Π ∩ [1, k]} and k + = min{ Π ∩ [ k + 1, n]}. [18] Update Π 0 ← Π 0 ∪ getCP( k − , k) ∪ getCP( k + 1, k + ). [19] Update Π ← Π 0 and Π ← Π ∪ Π 0 .

[20] return Π -the selected set of estimated CP locations.

For a given k, we want to recursively calculate the SN for d > 0. Define S YY (k) = k i=1 Y n (i) 2 , S Y (k) = k i=1 Y n (i) and S IY (k) = k i=1 iY n (i). The SN can be decomposed as V (Y) n (k | d) = n 4(d + 1) 2 (d + 1) 2 S YY (k + 1 + d) − S YY (k − d − 1)

Moreover, simplifying L (Y) n (k | k − d, k + 1 + d) = L (Y) n (k | d), the squared localized process is defined as

Both L (Y) n (k | d) 2 and V (Y) n (k | d) requires O(1) time to update as d increases and k being fixed. Therefore, the score at time k, T (Y) n (k) = sup n ≤d≤n L (Y) n (k | d) 2 /V (Y) n (k | d) can be recursively computed and thus requires O(n) computational complexity. Consequently, the aggregation process takes O(n 2 ) computational complexity.

2: The finite-n adjusted critical values c α (n, ρ) of T (ℵ) n under α = 5% nominal size

3: The finite-n adjusted critical values c α (n, ρ) of T

1: SaRa for CPs location estimation

X n -the data set

ii) T (C) n (·) -the proposed score function

C(·) -criterion function to perform CP times selection

Compute the initial set of estimated CP locations Π ≡ { k 1

Select m = arg min 1≤j≤ m C( k (1) , . . . , k (j) )

If H 0 is rejected, the proposed score function used in the calculation can be reused and integrated to the screening and ranking algorithm (Niu and Zhang, 2012) to obtain CP estimates;see Algorithm E.1, where h = n and the criterion function C can be maximum likelihood (Yao, 1987), least-squares estimation (Lavielle and Moulines, 2000), and minimum description length (Davis et al., 2006), etc. We can also incorporate our CP test and CP location estimator (3.11) into the popular binary segmentation algorithm (Vostrikova, 1981) to produce a potentially better set of CP locations estimator; see Algorithm E.2. The stopping criteria

Heteroskedasticity and autocorrelation consistent covariance matrix estimation

Testing for multiple change points

Estimating and testing linear models with multiple structural changes

Shanghai-Hong Kong Stock Connect: An analysis of Chinese partial stock market liberalization impact on the local and foreign markets

An extension of the MOSUM technique for quality control

Testing for change-points in long-range dependent time series by means of a self-normalized Wilcoxon test

Convergence of probability measures

Stock market volatility and trading volume: A special case in Hong Kong with Stock Connect turnover

Mean-structure and autocorrelation consistent covariance matrix estimation

2022+) Optimal difference-based variance estimators in time series: A general framework

Trading from home: The impact of COVID-19 on trading volume around the world

MOSUM tests for parameter constancy

20 nonparametric methods for changepoint problems. Handbook of Statist

Limit Theorems in Change-point Analysis

Structural break estimation for nonstationary time series models

Change-point detection under dependence based on two-sample U -statistics

2020) A robust method for shift detection in time series

Non-parametric change-point tests for long-range dependent data

A MOSUM procedure for the estimation of multiple random change points

Wild binary segmentation for multiple change-point detection

Why does the option to stock volume ratio predict stock returns?

Coronavirus: Impact on stock prices and growth expectations

A functional central limit theorem for weakly dependent sequences of random variables

On self-normalization for censored dependent data

Return and volatility spillovers effects: Evaluating the impact of Shanghai-Hong Kong Stock Connect

2022+) Modelling the COVID-19 infection trajectory: A piecewise linear quantile trend model

The option to stock volume ratio and future returns

Heteroskedasticity-autocorrelation robust standard errors using the Bartlett kernel without truncation

A new asymptotic theory for heteroskedasticityautocorrelation robust tests

Algorithm as 229: Computing regression quantiles

Statistical aspects of self-similar processes

Least-squares estimation of an unknown number of shifts in a time series

Estimating the number of change points in a sequence of independent normal random variables

Testing that a dependent process is uncorrelated

The screening and ranking algorithm to detect DNA copy number variations

COVID-19 and investor behavior

Long-Memory Time Series: Theory and Methods

The information in option volume for future stock prices

An Introduction to

Financial liberalization and stock market crosscorrelation: MF-DCCA analysis based on Shanghai-Hong Kong Stock Connect

A self-normalized approach to confidence interval construction in time series

A simple test of changes in mean in the possible presence of long-range dependence

Self-normalization for time series: a review of recent developments

Local Whittle estimation of fractional integration for nonlinear processes

Testing for change points in time series

Weak convergence to fractional Brownian motion and to the Rosenblatt process

Sources of nonmonotonic power when testing for a shift in mean of a dynamic time series

Detection disorder in multidimensional random processes

Inference for change points in highdimensional data via self-normalization

All of Nonparametric Statistics

Strong invariance principles for dependent random variables

Asymptotic theory for stationary processes

Stock market openness and market quality: evidence from the Shanghai-Hong Kong Stock Connect program

Approximating the distribution of the maximum likelihood estimate of the change-point in a sequence of independent random variables

Estimating the number of change-points via Schwarz' criterion

Inference for multiple change points in time series via likelihood ratio scan statistics

Unsupervised self-normalized change-point testing for time series

Adaptive inference for change points in highdimensional data