key: cord-0602551-typ57f5y
authors: Cui, Yan; Yang, Jun; Zhou, Zhou
title: State-domain Change Point Detection for Nonlinear Time Series Regression
date: 2019-04-24
journal: nan
DOI: nan
sha: 8f7587edb240249b4333a4b870daa3b5e2cf65d8
doc_id: 602551
cord_uid: typ57f5y

Change point detection in time series has attracted substantial interest, but most of the existing results have been focused on detecting change points in the time domain. This paper considers the situation where nonlinear time series have potential change points in the state domain. We apply a density-weighted anti-symmetric kernel function to the state domain and therefore propose a nonparametric procedure to test the existence of change points. When the existence of change points is affirmative, we further introduce an algorithm to estimate the number of change points together with their locations. Theoretical results of the proposed detection and estimation procedures are given and a real dataset is used to illustrate our methods.

Consider the following state-domain nonlinear auto-regression

where µ(·) is an unknown regression function, { i } is a martingale difference sequence such that E[ i | ( i−1 , i−2 , · · · )] = 0 almost surely. Special cases of Eq.

(1) include threshold AR models (Tong 1990) , exponential AR models (Haggan & Ozaki 1981) and ARCH models (Engle 1982) , among others. Furthermore, Eq. (1) can be viewed as a discretized version of the diffusion model

where µ(·) is the instantaneous return or drift function, and {M(t)} is a continuous-time martingale. In the literature, the special case of Model (2) with dM(t) = σ(X t )dB(t) has been widely discussed to understand and model nonlinear temporal systems in economics and finance, where B(t) denotes the standard Brownian motion and σ 2 (·) is understood as the volatility function. Among others, Stanton (1997) , Chapman & Pearson (2000) and Fan & Zhang (2003) considered the nonparametric estimation of µ(·) and σ 2 (·). Zhao (2011) addressed the model validation problem for Eq. (2). In particular, Eq.

(2) can be used to model the temporal dynamics of financial data with {X t } being interest rates, exchange rates, stock prices or other economic quantities. Among others, Zhao & Wu (2006) considered kernel quantile estimates of Eq.

(2) for the exchange rates between Pound and USD. Liu & Wu (2010) constructed simultaneous confidence bands for µ(·) and σ(·) with the U.S. Treasury yield curve rates data. See also the latter papers for further references. Observe that we allow the error process to be general martingale differences in (1) which significantly expands the applicability of our theory and methodology in economic applications. As pointed out by one referee, conditional moment restrictions in dynamic economic models routinely arise from Euler/Bellman equations in dynamic programming, which are martingale differences. Furthermore, asset returns, due to the no-arbitrage theory, are (semi)martingales. Hence, their (demeaned) returns are martingale differences. Throughout this article, following Chapter 6.3 of Fan & Yao (2003) , we shall call (1) a state-domain nonlinear regression model. The term "state domain" originated from the celebrated state-space models (e.g. Kalman (1960) and Shumway & Stoffer (2000, Chapter 6) ) where the dynamics of a sequence of state variables ({X i } in Eq. (1)) are driven by a group of control variables ( i in Eq. (1)) through the nonlinear state equation (1). Therefore in this article the term "state domain" refers to the Euclidean space in which the variables on the axes are the state variables. Observe that the state-domain nonlinear regression (1) aims to characterize the relationship between X i and past values (states) of the time series through a discretized stochastic differential equation. On the contrary, time-domain nonlinear regression (see e.g. Fan & Yao (2003) , Chapter 6.2) X i = f (i/n) + ε i , i = 1, 2, · · · , n

with E[ε i ] = 0 describes the relationship between X i and time.

To date, most investigations on the nonparametric inference procedure of Eq. (1) are based on the assumption that the underlying regression function µ(·) is continuous, which may cause serious restrictions in many real applications. In fact, in parametric modeling of nonlinear time series, various choices of µ(·) with possible discontinuities have drawn much attention in the literature. One of the most prominent examples is the threshold model proposed by Tong & Lim (1980) , in which regime switches are triggered by an observed variable crossing an unknown threshold. Also, AR model with regime-switch controlled by a Markov chain mechanism was introduced by Tong (1990) . In economics, the expanding phase and contracting phase are not always governed by the same dynamics, see Durlauf & Johnson (1995) , McConnell & Perez-Quiros (2000) , Tiao & Tsay (1994) and other references therein. As a result, the occurrence of abrupt changes in the state-domain regression function µ(·) is common and detecting as well as estimating them are of vital importance. Motivated by this, in the current paper we focus on the situation where the regression function µ(·) is piece-wise smooth on an interval of interest T = [l, u] with a finite but unknown number of change points. More precisely, there exist l = a 0 < a 1 < · · · < a M < a M +1 = u such that µ(·) is smooth on each of the intervals [a 0 , a 1 ), · · · , [a M , a M +1 ]; that is, on the interval [l, u] µ(x) = M j=0 µ j (x)1(a j ≤ x < a j+1 ),

where M is the total number of change points. Throughout this article, we assume M is fixed. To our knowledge, there exist no results on change point detection of the state-domain regression function µ(·) in the literature. The purpose of this paper is twofold. First we want to test whether µ(x) is smooth or discontinuous on the interval [l, u] ; that is to test the null hypothesis H 0 : M = 0 of Eq. (4). By sliding a density-weighted anti-symmetric kernel through the state domain, we shall suggest a nonparametric test statistic and non-trivially apply the discretized multivariate Gaussian approximation result of Zaitsev (1987) to establish its asymptotic distribution. Additionally, the Gaussian approximation results also directly suggest a finite sample simulation-based bootstrapping method which improves the accuracy of the test in practical implementations. Second, if M ≥ 1, we reject the null hypothesis and subsequently want to locate all the change points. In this case, we propose an estimation procedure and establish the corresponding asymptotic theory on the accuracy of the estimators. Finally, the above theoretical results are of general interest and could be used for a wider class of state-domain change point detection problems.

There is long-standing literature in statistics discussing jump detection of the time-domain nonlinear regression model (3) where occasional jumps occur in an otherwise smoothly changing time trend f (·). It is impossible to show a complete reference here and we only list some representative works. Müller (1992) and Eubank & Speckman (1994) employed a kernel method to estimate jump points in smooth curves. Wang (1995) suggested using wavelets and provided a review of jump-point estimation. Two-step methods were considered by Müller & Song (1997) and Gijbels et al. (1999) to study the asymptotic convergence properties of the jumps. Later, Gijbels et al. (2007) suggested a compromise estimation method which can preserve possible jumps in the curve. Zhang (2016) considered the situation where the trend function allows a growing number of jump points. In econometrics, there is a significant body of literature discussing time-domain jump detection in jump diffusion models; see for instance Bollerslev et al. (2008) , Jiang & Oomen (2008) , Lee & Mykland (2012) and the references therein. On the other hand, it is well known that state-domain asymptotic theory is very different from that of the time domain (see, for instance Fan & Yao (2003) , Chapter 6). In our specific case, uniform asymptotic behavior of our test statistic on [l, u] is arguably more difficult to establish than the corresponding problem in the time domain. In the current paper, we establish that, unlike time-domain change point detection problems of (3) where the long-run variances of the process are of crucial importance in the asymptotics, state-domain change point detection theory of (1) heavily depends on the conditional variances and densities of the process {X i }. We also provide an estimation procedure using simulated critical values to detect and locate the change points. We show that, when the jump sizes have a fixed and positive lower bound, the method will asymptotically detect all the change points with a preassigned probability and an accuracy c n which is much smaller than 1/ √ n, where n is the length of the time series.

The rest of the paper is organized as follows. In Section 2, we introduce the model framework and some basic assumptions. Section 3 contains our main results, including a nonparametric test to determine the existence of change points and a procedure for estimating the number of change points together with their locations. Practical implementation based on a bootstrap procedure and a suitable method for bandwidth selection are discussed in Section 4. Section 5 reports some simulation studies. A real data application of daily COVID-19 infections in Germany is carried out in Section 6. Section 7 contains all the proofs of the theoretical results in Section 3.

Throughout this paper, we use the following notations. A random vector X ∈ L p if X p := (E|X| p ) 1/p < ∞. For two random variables U and V , F U | V (·) denotes the conditional distribution function of U given V and f U | V (·) denotes the conditional density. Furthermore, for function g with

given V . Finally, 1 stands for the indicator function.

Assume that the process { i } is stationary and causal. Following Wu (2005) , we assume that { i } is a Bernoulli shift process such that

where the function G * is a measurable function such that the process { i } exists and ξ i = (· · · , η i−1 , η i ) is a shift process, where {η i } are independent and identically distributed (i.i.d.) random variables. Furthermore, { i } is a martingale difference sequence satisfying E[ i | ( i−1 , i−2 , · · · )] = 0 almost surely. From Eq. (5), one can interpret the transform G * as the underlying physical mechanism, with ξ i and G * (ξ i ) being the input and output of the system, respectively. Similarly, we assume

where G is a measurable function such that X i exists. To facilitate the main results, we first introduce the time series dependence measures in Wu (2005) associated with X i and i . Assume X ∈ L p , and let X n = G(ξ n ), ξ n := (ξ −1 , η 0 , η 1 , . . . , η n ),

where X n is a coupled process of X n with η 0 replaced by an i.i.d. copy η 0 . Then, we define the physical dependence measures of X i as θ n,p = X n − X n p .

Let θ n,p = 0 if n < 0. Thus for n ≥ 0, θ n,p measures the dependence of the output G(ξ n ) on the single input η 0 . We refer to Wu (2005) for more details on the physical dependence measures. Similarly, we define the physical dependence measures for the errors as

where n = G * (ξ n ). Let θ * n,p = 0 if n < 0. Suppose that {X i } n i=1 is observed. Recall H 0 : M = 0 and we aim to test the null hypothesis that the regression function is smooth. To this end, we introduce a density-weighted anti-symmetric kernel functionK n , which is defined byK

where K(·) is a kernel function supported on S = [0, 1] with S K(u)du = 1 and K * (u) := K(−u). The data-dependent weights w n (x, b) and w * n (x, b) are defined by

where b = b n is the bandwidth satisfying b → 0 and nb → ∞. Note that w n (x, b) and w * n (x, b) are one-sided kernel density estimators. Hencẽ K n (X, x, b) can be approximated by

is the density function of X i . Observing that K(x) − K * (x) is an anti-symmetric function, we then callK n (X, x, b) a density-weighted antisymmetric kernel function. By sliding this kernel functionK n through the state domain, we are able to test whether µ(x) has change points. More specifically, the quantity n k=2K n (X k−1 , x, b)X k /nb is a boundary kernel estimation of µ(x + ) − µ(x − ), where µ(x + ) and µ(x − ) are the right and left limits of µ(·) at x. Thus, if x is a continuous point of µ(·), this quantity will be approximately zero at x. However, if µ(·) is discontinuous at x, the quantity will be approximately equal to the jump size of µ(·) at x. To establish our first main result, we need the following regularity conditions:

(c) For the same p defined in Condition (b), assume that X i ∈ L p , θ n,p = O(ρ n ), and θ * n,p = O(ρ n ) for some 0 < ρ < 1.

The density function f of X i is positive on [l − , u + ] for some > 0 and there exists a constant B < ∞ such that

(e) K(·) is differentiable over (0, 1), the right derivative K (0+) and the left derivative K (1−) exists and sup 0≤u≤1 |K (u)| < ∞. The Lebesgue measure of the set {u ∈ [0, 1] : K(u) = 0} is zero. Furthermore, K(0) = K(1) = 0, K (0) > 0 and 1 0 uK(u)du = 0. For the above regularity conditions, Condition (a) specifies the allowable range of the bandwidth. Condition (b) puts a mild moment restriction on i . Condition (c) requires that the quantities θ n,p and θ * n,p satisfy the geometric moment contraction (GMC) property. The GMC property is preserved in many linear and nonlinear time series models such as the ARMA models and the ARCH and GARCH models; see Shao & Wu (2007) for more discussions. Furthermore, denote Θ n := n i=0 θ i,2 , which measures the cumulative dependence of X 0 , ..., X n on η 0 . Then if Condition (c) holds, it is easy to see that Θ ∞ < ∞ which indicates short-range dependence of {X i }. With Condition (d), we require that the density and conditional density of X i exist and are bounded. Moreover, f has bounded derivatives up to the second order. Condition (e) puts some restrictions on the smoothness and order of the kernel function K. In particular, 1 0 uK(u)du = 0 indicates that K is a second-order kernel which has both positive and negative parts on [0, 1].

In this section, we propose a test on the existence of change points in µ(·) and an algorithm to estimate the number and locations of the change points when µ(·) is discontinuous.

3.1. Test for the existence of change points. With the foregoing discussion, we introduce a nonparametric statistic based on the density-weighted anti-symmetric kernel to test whether model Eq.

(1) has change points in the state domain regression function µ(·) on [l, u] . By proper scaling, our test statistic is defined as

where

In practice, since f (·) and σ(·) are unknown, we use the kernel density estimator f n (x) and Nadaraya-Watson (NW) estimator σ 2 n (x) to replace f (x) and σ 2 (x), respectively. The kernel density estimator is given by

where W (·) is a general kernel function with W (·) ≥ 0 and W (u)du = 1, h = h n is the bandwidth sequence satisfying h → 0 and nh → ∞. Let e 2 k = [X k − µ n (X k−1 )] 2 be the square of the estimated residuals, where

is the NW estimator of µ(·), then the NW estimator of σ 2 (x) is given by

The following remark provides the uniform consistency of the estimated density and conditional variance functions.

Remark 3.1. Under Condition (a) for both bandwidths h and b with 0 < δ 1 < 1/4, Condition (c), Condition (d), and Condition (e), we have

where ψ W := u 2 W (u)du/2 and

Similarly, for σ 2 n (x), under the conditions of Theorem 3.2, we also have

See Section 8.1 for the proof.

Let f (·) be the density function of i and λ K = K 2 (x)dx. We have the following main result on the asymptotic properties of the proposed test statistic.

Theorem 3.2. Let l, u ∈ R be fixed. Recall the piece-wise formulation of Eq. (4), let T j and T be the -neighborhoods of the intervals T j = [a j , a j+1 ) and T = [l, u], respectively. Let T a = {a j } be the collection of the change points, T a be the -neighborhood of T a . Assume that Condition (a)-

whereb := b/(u − l) and

Proof. See Section 7.1.

Theorem 3.2 is a general result which establishes the asymptotic theory of the test statistic. In practical implementation, we will use the density estimates f n (x) and variance estimates σ n (x) instead of f (x) and σ(x) to calculate t n (x) as discussed before. Therefore, we have the following corollary.

Under the conditions of Theorem 3.2 and further assume the bandwidth h ≤ b, then the asymptotic result of Theorem 3.2 holds for t * n (x); this is

Note that in Corollary 3.3, we have added the assumption h ≤ b with the purpose of ensuring the consistency of f n (x) and σ n (x) on T ∩ (T b a ) c . When there is no change point in µ(·), we have similar results as shown in the following remark, which suggests that under the null hypothesis, after proper scaling and centering, our test statistic converges to a Gumbel distribution asymptotically.

Remark 3.4. Assume H 0 : M = 0 holds. We further assume that f (·), σ(·) ∈ C 3 (T ) and the remaining conditions of Corollary 3.3 hold. Then, T a = ∅,

Denote the jump-size of µ(·) at a i as ∆ i ; that is,

When H a holds true, it is easy to see that the proposed test has an asymptotic power 1 as n → ∞. In other words, with some preassigned level α ∈ (0, 1) and as n → ∞, we have

Once the null hypothesis of no change point is rejected, one would be interested in detecting the number of change points together with their locations, which we discuss in Section 3.2.

3.2. Change-point Estimation. Suppose there exist a fixed number M of change points on [l, u] , which are denoted by l < a 1 < · · · < a M < u, with the minimum jump size min 1≤i≤M ∆ i ≥∆ n > 0. In this paper, we assumẽ ∆ n = O(1) which is allowed to decrease with n. The idea for estimating the number and locations of the change points is to search for local maximas of |t n (x)| which exceed the critical value of the test. To be more specific, we propose in the following a procedure for change point estimation.

• For a fixed level α, perform the bootstrap procedure described in Section 4.1 to determine the critical value, say C n,α . • Set T 1 := (l, u).

• Starting from the interval T 1 , find the largest x of |t n (x)| that exceeds the critical value and denote its location asâ (1) , then rule out the

• Repeat the previous step until all significant local maximas are found.

In other words, |t n (x)| on the remaining intervals are all below C n,α . • Denote the number of detected change points byM and re-order the estimated change points as l <â 1 < · · · <âM < u.

The following theorem provides an asymptotic result forM andâ i .

Theorem 3.5. Under the conditions of Theorem 3.2, we further assume that K (·) is differentiable over (0, 1) with K (1) = 0, the right derivative K (0+) and the left derivative K (1−) exist and sup 0≤u≤1 |K (u)| < ∞. The

Lebesgue measure of the set {u ∈ [0, 1] : K (u) = 0} is zero. If log n nb = o(∆ n ) then for any given level α, we have

for any c n such that 1/c n = O ∆ n n b log n Proof. See Section 7.2.

Theorem 3.5 reveals that for any given small probability α, with asymptotic probability 1 − α, our proposed procedure will correctly estimate all the change points with an accuracy c n . It is important to mention that wheñ ∆ n =∆ > 0, that is, when the jump sizes have a fixed lower bound, the smallest order for c n is b log n/n, which is smaller than n −1/2 . It can also be seen as a product of √ log n and the optimal convergence rate ( b/n) of time-domain change-point estimators established in Müller (1992) . Hence, we conjecture that our rate c n is nearly optimal for state-domain change point detection.

4.1. The bootstrap procedure. It is well known that the convergence rate of the Gumbel distribution in Theorem 3.2 is slow. As a result, a very large sample size would be needed for the approximation to be reasonably accurate. To overcome this issue, we propose a simulation-based bootstrap procedure to improve the finite-sample performance of the proposed test. The bootstrap procedure is as follows.

• Generate i.i.d. standard normal random variables U k , k = 0, ..., n.

• Compute the quantity Π * n defined in Eq. (26) for many times and calculate its (1 − α)th quantile as the critical value of our test.

For the proposed boostrap procedure, we have the following theoretical results which shows that, with proper scaling and centering, Π * n has the same asymptotic Gumbel distribution.

where {U k } n k=0 are i.i.d. standard normal random variables and g(x) is its density. Assume H 0 : M = 0, Condition (a), Condition (e) hold and b satisfies

Then we have

Proposition 4.1 shows that Π * n and Π n have the same asymptotic Gumbel distribution with proper scaling and centering under the null hypothesis. Therefore, the (1 − α)th quantile of Π n can be estimated consistently by calculating the empirical (1 − α)th quantile C n,α of Π * n with a large number of replications by the bootstrap procedure. We reject the null hypothesis at level α ∈ (0, 1) if Π n > C n,α . When implementing the procedure described in Section 3.2 for estimating the change points, we also suggest using C n,α to find the detection region. Our numerical experiments suggest that the bootstrap method yields more accurate results than those based on the asymptotic limiting distribution under small or moderate sample sizes.

The bandwidth used in f n (x) can be chosen based on classic bandwidth selectors for nonparametric kernel density estimation. However, the choice of bandwidth b for test statistic t * n (x) and h for the estimated variance σ 2 n (x) can be quite nontrivial and are usually of practical interest. In this paper, we adopt the standard leave-one-out cross-validation criterion for bandwidth selection suggested by Rice & Silverman (1991) :

where µ (−k) n (X k ) and σ 2(−k) n (X k ) are the kernel estimators of µ and σ 2 computed with all measurements with the kth subject deleted, respectively. For example, a cross-validation bandwidthb can be obtained by minimizing CV(b) with respect to b, i.e.,b = arg min b∈B CV(b), where B is the allowable range of b. The bandwidth selection for h is similar.

In this section, we carry out Monte Carlo simulations to examine the finitesample performances of our proposed test and estimator. Throughout the numerical experiments, the Epanechnikov kernel W (x) = 0.75(1 − x 2 )1(|x| ≤ 1) is used for estimating the density and conditional variances. On the other hand, we adopt the higher-order kernel function in the form

in the expression ofK n , whereW (x) is the kernel function on [0,1] by shifting and scaling W (x). From Theorem 3.2, one can see that the power of our test increases as λ K decreases. As a result, we aim to maximize the quantity Q(a, b)

in our simulations and data analysis.

5.1. Accuracy of bootstrap. We run Monte Carlo simulations to study the accuracy of the proposed bootstrap procedure for finite samples n = 200, 500 and 800. Here, we aim to test the null hypothesis H 0 of no change point in the regression function. The number of replications is fixed to be 1000 and the number of bootstrap samples is B = 2000 at each replication.

To guarantee the stationarity of the process {X i }, |µ(x)| is required to be less than one (Fan & Yao 2003 , Section 2.1). First, we consider Model A listed below to investigate the robustness of our testing procedure with respect to various levels of persistence in the data generating process. Additional four state-domain nonlinear models (Models B -E listed below) where µ(·) is of various shapes are further investigated for the accuracy of our test. In our simulations the martingale difference process

∼ N (0, 1) so that { i } has a period of 7 which matches the data generating process observed in the empirical data example in Section 6.

• Model A:

where κ 1 = 0.2, 0.4, 0.6, 0.8 represents various levels of temporal dependence in the series. • Model B:

• Model D: µ(x) = 0.8 sin(x), σ(x) = 1.

• Model E: µ(x) = 0.5 cos(x).

Note that the regression functions µ(·) in Models A-E are all continuous. At nominal significance levels α = 0.05 and 0.1, the simulated Type I error rates for sample sizes n = 200, 500 and 800 are reported in Tables 1-2 for Model A and Models B-E, respectively. To measure the strength of the nonlinear temporal dependence, we will employ the auto-distance correlation function (ADCF) investigated in Zhou (2012) . In Table 1 , we illustrate the first order ADCF (denoted by R(1)) for Model A. Meanwhile, for Model E the first order and the seventh order ADCF are listed in Table 2 . One can see that the performance of our testing procedure is reasonably accurate for different sample sizes across the models and the accuracy improves as the sample size increases. On the other hand, from Table 1 , we find that as the dependence of the process becomes stronger, the type I errors tend to be less accurate, but are still in a reasonable range. 

∼ N (0, 1). Here, we consider the following two types of alternatives with a change point of size δ :

• Model F 1 :

• Model F 2 :

We let the jump size δ range from 0 to 1.6 for model F 1 and from 0 to 1 for model F 2 at location x = 0. For each model, we investigate the empirical sensitivity of our testing procedure under nominal levels 0.05 and 0.1 with sample size n = 800 based on 1000 replications. The simulated power curves for the above models are plotted in Fig. 1 and Fig. 2 , respectively. According to the plots, the statistical power of the proposed testing procedure increases reasonably fast as δ increases. On the other hand, we also observe that our test shows a slower speed of increase at near alternatives when compared with "classic" power curves of parametric tests. We believe that part of the reason is that our nonparametric test aims at detecting alternatives from a large class of discontinuous functions while tests tailored to some parametric models (such as the threshold model) target a specific class of alternative functions. Therefore our test is expected to be less sensitive to small deviations from the null compared to those parametric tests. See also Section 5.4 for a numerical experiment that compares the sensitivity of our testing procedure with that of a parametric test of the threshold model.

Accuracy in estimating the number of change points and their locations. Utilizing the algorithm listed in Section 3.2, in this subsection we focus on estimating the number of change points and their locations based on 1000 realizations with sample sizes n = 200, 500 and 800. In the simulations, we let the error process { * i } n i=1 be i.i.d. standard normal random variables and consider the following two cases:

• Case 1: A single change point.

• Case 2: Two change points.

The estimates of the locations of change points are compared in terms of their mean absolute deviation errors (MADE) and mean squared errors (MSE). We also report the simulated percentage of correctly estimating the number of change points. The results are listed in Table 3 . One can see from Table 3 that the values of MADE and MSE are all quite small, which suggests the estimated locations by our approach are fairly accurate. Furthermore, as the sample size increases, the percentage of correctly estimating the number of change points increases in both cases. 5.4. Comparison to threshold testing and estimation in threshold model. In this subsection, we compare the accuracy and sensitivity of our nonparametric method with existing threshold testing and estimation methods for the classic threshold AR (TAR) model proposed by Tong & Lim (1980) when the TAR model is indeed the underlying data generating mechanism. We consider the following two-regime TAR(1) model ∼ N (0, 0.75 2 ). First, we are interested in comparing the accuracy and power of our nonparametric test with the parametric F -test of threshold nonlinearity proposed in Tsay (1989) . Table 4 shows the testing results for nonlinearity of the model based on both the parametric and nonparametric methods, in which the sample size is n = 800 and the number of bootstrap samples is B = 2000.

We observe that the nonparametric method has slightly higher powers when the scale coefficient κ 2 changes slightly from 0.5. However, as κ 2 becomes 0.1 or smaller, the parametric method has higher powers than the nonparametric method. In addition, we compare the accuracy in change point estimation. We study the following TAR(1) model,

∼ N (0, 0.75 2 ). Note that parametric estimation of the threshold value of the above two-regime TAR(1) process can be done via the R function uTAR in the NTS package (we refer to Liu et al. (2020) for more details). The simulated MADEs and MSEs are listed in Table 5 . From Table 5 , one can see that both methods provide relatively accurate estimates of the locations of change point (threshold). The parametric method shows more accurate estimation results comparing with those of the nonparametric method. With the above observations, it can be seen that the parametric method is better for testing and detecting change points for the TAR model when the model is well-specified. This result is not surprising since testing sensitivity and estimation accuracy tend to be higher when the model is correctly restricted to a (smaller) parametric class. in Germany. The dataset contains 156 observations from April 28th to September 30th of 2020 which can be downloaded from https://ourworldindata.org/coronavirus-source-data. From the COVID-19 timeline, Germany registered the first case on January 28th and later suffered an outbreak of this pandemic from mid March to late April. In the data analysis, we select the aforementioned time span between the first and second waves of COVID-19 so that the time series is approximately stationary. Let X i be the logarithm of confirmed cases at day i = 1, ..., 156 and Y i = X i+1 − X i . The sample path of {X i } and the ADCF plot of {X i } are shown in Fig. 3 . Both plots in Fig. 3 suggest that the time series is approximately stationary and has a moderate seasonal dependence with period S = 7. The seasonal behaviour probably comes from the reporting lag behind during weekends, which happens in almost every country. We consider the state-domain nonlinear regression model (which is equivalent to Eq. (1)):

where { i } is a martingale difference sequence. In this application, µ(x) represents the expected increase or decrease in percentage of COVID-19 cases in day i when X i−1 = x. We apply the proposed method to testing whether µ(·) contains any change points. We choose T = [l, u] = [5.7, 7.5] which includes 82.69% of X i so that data are relatively abundant in this region and the test is expected to be accurate. According to the leave-one-out cross-validation criterion, the selected bandwidths b and h are 0.446 and 0.40, respectively. Through the practical implementation in Section 4.1, we calculate the empirical 99% quantile of Π * n with 10000 bootstrap samples, which gives C n,α = 1.596. Next, we investigate the behaviour of the test statistics, which is shown in Fig. 4 . Our test rejects the null hypothesis of continuity of µ(·) at 1% level and flags two change points atx 1 = 6.83 andx 2 = 7.40.

Note that Y i can be viewed as the conditional daily growth rate for COVID-19. For comparison, we also use the nonparametric local polynomial method to fit µ(x) assuming that there is no change point. The corresponding estimated regression function µ n (x) over [5.7, 7.5 ] is plotted on the left hand side of Fig. 5 . On the right hand side of Fig. 5 we plot the fitted drift function µ n (x) with the knowledge of the change points. The difference between the two plots in Fig. 5 suggest that, with or without the knowledge of change points, our understanding of the relationship between Y i and X i can be quite different. With the knowledge of the change points, we can see that two large jumps exist at x 1 = 6.83 and x 2 = 7.40, which shows that the growth rate changes abruptly at these two points. It is obvious to see from the right plot of Fig. 5 that those two change points divide the state domain into three regimes/phases. Furthermore, the latter plot indicates that the nonlinear dynamics can be approximated by a three-regime threshold model with the data generating mechanism switching at the detected change points. Additionally, according the timeline, we can find out the periods corresponding to each phase. The first phase x ∈ [5.7, 6.83) contains May 3-5, 10-11, May 15-August 5, August 9-10, 16-17, 23-24 and 30-31 where the trajectory depicts a relatively inactive period of the virus transmission and the conditional infection rate µ(x) decreases from positive to negative as x increases. The second phase x ∈ [6.83, 7.4) includes April 28-30, May 6-9, August 7-9, [11] [12] [13] [14] [15] [25] [26] [27] [28] [29] [8] [9] [13] [14] [15] where the conditional infection rate jumps up when x surpasses 6.83 and then it decreases gradually again. The third phase x ∈ [7.4, 7.83] corresponds to August 20-21, September 16-19 and 22-26 where a sudden large increase in the conditional infection rate can be found at the left boundary and then it decreases sharply, possibly due to strong governmental interventions.

In summary, the analyzed period from April 28th to September 30th of 2020 of German COVID-19 data shows a complicated nonlinear dynamic balance between disease transmission and government intervention. The proposed method of the paper could help understand this complex nonlinear dynamics by determining the boundaries of phases where the state-domain relationship changes abruptly and subsequently segment the time series into multiple regimes. Ng, and two anonymous referees for their valuable comments and suggestions which significantly improved the quality of the paper. Zhou Zhou's research has been partially sponsored by NSERC (No.489079 

and prove the results involving the first two terms. This is given in Section 7.1.1. Secondly, we use a technique called m-dependent approximation to approximate the martingale

where ξ k 1 ,k 2 := (η k 1 , . . . , η k 2 ), for a properly chosen order m → ∞, which simplifies the sum of a sequence of dependent random variables to a corresponding sum of m-dependent random variables. This is done in Section 7.1.2.

Thirdly, we divide the sequence of n (m-dependent) random variables into alternating big and small blocks, where the length of big blocks has a slightly higher order than that of the small blocks. Furthermore, the length of the small blocks is larger than m. Using this proof technique, we can approximate the sum of n (m-dependent) random variables using the sum of the subsequence which includes the random variables residing in the big blocks. Since the length of small blocks is larger than m, the m-dependent random variables in different big blocks are independent. This part of the proof is given in Section 7.1.3.

Fourthly, we only need to deal with a sequence of independent sums of random variables within each big block. In order to get prepared for using the multivariate Gaussian approximation result by Zaitsev (1987) , we first compute the asymptotic covariance structure of the sequence of independent sums. This is given in Section 7.1.4.

In the final two steps, we first apply the multivariate Gaussian approximation by Zaitsev (1987) , which is given in Section 7.1.5 and then prove the convergence to Gumbel distribution, which is given in Section 7.1.6. The techniques used in these two steps heavily depend on some existing work, particularly, the work by Liu & Wu (2010) , Zhao & Wu (2008) , which eventually applied the work by Bickel & Rosenblatt (1973) , Rosenblatt (1976) . 7.1.1. Decomposition. First, we substitute X i = µ(X i−1 ) + i into t n (x) and separate the terms involving K and K * . We first focus on the term involving K only. That is,

Next it is easy to see that by the definition of w(x, b), the second term of the decomposition on the right hand side of Eq. (41) equals µ(x). For the first term of the decomposition in Eq. (41), following exactly the proof of (Liu & Wu 2010, Lemma 5 .2), uniformly over x, we have that

where τ n := b log n n (Zhao & Wu 2008 , Lemma 2(ii)), and in the last equality we have applied the

7.1.2. m-dependent approximation. For the third term of the decomposition in Eq. (41), recalling that we have defined the notation ξ k 1 ,k 2 := (η k 1 , . . . , η k 2 ), we consider the decomposition of k ,

where m = n τ where τ < 1 − δ 1 . The first and last terms in the decomposition can be ignored comparing to the second term. To see this, consider

Similarly, one can verify in the same way that

Furthermore, since the martingale differences are uncorrelated, we have

Therefore, defining

we have

Next, following exactly the proof of (Liu & Wu 2010, Lemma 5 .3), we get that uniformly over x

Following the above arguments again we can compute the orders for the decomposition of the term involving K * and get t n (x) by the differences. Note that many terms such as µ(x) in the second term and O(b 2 ) term in the first term cancel out. Therefore, overall it can be easily verified that

whereK(·) is an anti-symmetric kernel defined bỹ

Now to prove Theorem 3.2, it suffices to show

where

Note that we have E[ζ i ] = 0 and E[ζ 2 i ] = 1. Next, we define a truncated version of ζ i by

We next defineM n (x) using m-dependent conditional expectations

whereσ 2 := Eζ 2 1 .

Recall that m = n τ . We choose τ 1 such that τ < τ 1 < 1 − δ 1 and split [1, n] into alternating big and small blocks H 1 , I 1 , · · · , H ιn , I ιn , I ιn+1 with length |H i | = n τ 1 , |I i | = n τ , ∀1 ≤ i ≤ ι n , and |I ιn+1 | = n − ι n ( n τ 1 + n τ ). Note that ι n = n/( n τ 1 + n τ ) . Then we define

Then we define

Next we show in the following that we can approximate M n (x) byM n (x) and then approximateM n (x) by M n (x). That is, we show

To show Eq. (62), we first follow the proof of (Liu & Wu 2010, Lemma 5.1) using Freedman's inequality for martingale differences Freedman (1975) to get

which implies we can approximate M n (x) by replacing ζ k withζ k in the definition of M n (x). Next, we write K ρ m 1 , where ρ 1 = 1+ρ 2 . Using the assumption θ n,p = O(ρ n ), we have sup

Now, we can show the maximum of the skeleton process over {x j }, j = 1, . . . , q n is small. Recall that m is a polynomial of n, then we have

Next, we show the third term of the decomposition of K

in Eq. (64) can also be ignored. In order to show this, we define

Using the same argument as in (Liu & Wu 2010 , Proof of Lemma 4.2), we can approximate N n (x) by its skeleton process, since sup x j−1 ≤x≤x j |N n (x) − N n (x j )| = o P (log n) −2 . We first approximate sup x |N n (x)| by the maximum over the skeleton process. Then we have P max 1≤j≤qn |N n (x j )| ≥ (log n) −2 = o(1) using Freedman's inequality for martingale differences Freedman (1975) . Therefore, we can approximate M n (x) by

k ] byζ k /σ 2 , which leads to the definition ofM n (x). Therefore, we have proved

Therefore, in order to finish the proof of Eq. (62), it suffices to show

where R n (x) := 1 nb j∈∪ ιn+1 i=1 I i u j (x). Following the same argument as above using skeleton process, we only need to consider the grids {x j , j = 0, . . . , q n }. Using the fact that τ < τ 1 and n −δ 1 = O(b), again by Freedman's inequality for martingale differences, for some constant C that r(s) := EM n (x)M n (x + s),r(s) := K (x)K(x + s)dx/λK, andK 2 :

. Then by the definition of r(s), using λK = 2λ K , we havẽ

Next, according to (Bickel & Rosenblatt 1973 , Theorems B1 and B2), we have r(s) = 1 − K 2 |s| 2 + o(|s| 2 ). Note that

This impliesr(s) = 1 −K 2 |s| 2 + o(|s| 2 ), which can also be obtained directly from (Bickel & Rosenblatt 1973 , Theorems B1 and B2). Next, we showr(s) =r(s) + O(b). Note that {ζ k } are uncorrelated and Eζ k = 0. Then, using |f (v + s) − f (t)f (s)| = O(b) uniformly over |s − t| ≤ 2b and |v| ≤ 2b, we have EM n (t)M n (s)

Therefore, we have proved that, as s → 0, r(s) = 1 −K 2 |s| 2 + o(|s| 2 ),r(s) = r(s) + o(|s| 2 ),r(s) =r(s) + O(b).

7.1.5. Gaussian approximation. Now, we go back to prove Eq. (77). We use similar techniques as in (Liu & Wu 2010, Proof of Lemma 4.5) . First, as in Bickel & Rosenblatt (1973) , we split the interval T into alternating big and small intervals W 1 , V 1 , . . . ,

. We let w be fixed, and v be small which goes to 0. Since u and l are fixed numbers, without loss of generality, we assume l = 0 and u = 1 in this proof. Next, we approximate Ω + := sup 0≤t≤1Mn (t) by big blocks {W k }. That is, by Ψ + := max 1≤k≤N Υ + k , where Υ + := sup t∈W kM n (t). Then we further approximate Υ + k via discretization by Ξ + k := max 1≤j≤χMn (a k + jax −1 ), where χ = wx/a with a > 0. We define Ω − , Ψ − , Υ − k , and Ξ − k similarly by replacing sup or max by inf or min, respectively. Letting Ω = max(Ω + , −Ω − ) = sup 0≤t≤1 |M n (t)| and x z = d n + z/(2 log b −1 ) 1/2 , we have

where

Next, we are ready to apply Gaussian approximation. We first use discretization for approximatingM n (x). Let s j = j/(log n) 6 , 1 ≤ j < t n , where t n = 1 + (log n) 6 t , s tn = t. Write [s j−1 , s j ] = qn k=1 [s j,k−1 , s j,k ], where q n = (s j − s j−1 )n 2 = n 2 /(log n) 6 and s j,k − s j,k−1 = (s j − s j−1 )/q n . Following the same arguments as in (Liu & Wu 2010 , Proof of Lemma 4.6), we have the following discretization approximation holds for all large enough Q,

Next, we apply the multivariate Gaussian approximation from Zaitsev (1987) .

To this end, similar to the definition of u j (t), we first definẽ

Note that the sequence of random variables {ũ j (t), j = 1, · · · , ι n } are independent. Then we define

Now we introduce M n (t) := 1 √ nbλK f (t) ιn j=1 u j (t). Then using (Zaitsev 1987 , Theorem 1.1) as well as sup t max 1≤j≤ιn u j (t) −ũ j (t) ≤ Cn −Q for large enough Q, we have

where (Y n (1), . . . , Y n (t n )) is a centered Gaussian random vector with covariance matrix Σ n = Cov( M n (v + s 1 ), . . . , M n (v + s tn )). Let ψ be the density function of standard Gaussian, and H 2 (a) be the Pickands constants (Bickel & Rosenblatt 1973 , Theorem A1, Lemma A1, and Lemma A3). Using Eq. (85), let t > 0 be such that inf{s −2 (1 −r(s)) : 0 ≤ s ≤ t} > 0. Following exactly the arguments in (Liu & Wu 2010 , Proof of Lemma 4.6) to apply (Bickel & Rosenblatt 1973 , Lemma A3 and Lemma A4), we can obtain that for a > 0,

uniformly over 0 ≤ v ≤ 1. The limit when a → 0 also holds, that is

where we have used the Pickands constants H 2 = lim a→0 H 2 (a)/a = 1/ √ π. The left tail version of the tail bounds also holds with ≥ x replaced by ≤ x. 

Therefore, it suffices to show the following convergence to Gumbel law

7.1.6. Convergence to Gumbel distribution. The main steps of the proof of Eq. (100) are as follows. First, we approximateM n (t) by Y n (t). Then, we approximate Y n (t) by another quantityM n (t) which is defined similarly toM n (x) but using a sequence of i.i.d. random variables instead of the dependent time series {X k }. Finally, we apply (Rosenblatt 1976, Theorem) to show convergence to Gumbel distribution. We define

where Y n (·) is a centered Gaussian process with covariance function Cov(Y n (s 1 ), Y n (s 2 )) = Cov(M n (s 1 ),M n (s 2 )).

First we approximateM n (t) using Y n (t). Recall that w and v are the lengths of big and small blocks W i and V i . Let N = 1/(w + v) . Define a different truncation order for M n (t) by M n (t) :

Then using M n (t) and following exactly the same proof from (Liu & Wu 2010 , Proof of Lemma 4.10) to get that, for any fixed integer l that 1 ≤ l ≤ N/2,

where A k := wx/a j=1 B k,j , C k := wx/a j=1 D k,j , C does not depend on l, and

Next, we constructM n (t) in the following way. Let {η 

Letting n → ∞ then l → ∞, by triangle inequality, we have lim sup 

whereM n (t) is defined in the same way asM n (t) by replacing

k } are i.i.d., we can apply (Rosenblatt 1976, Theorem) , which leads to the convergence of P sup 0≤t≤1 M n (t) < x z to e −2e −z . This completes the proof of Theorem 3.2. 7.2. Proof of Theorem 3.5. First, let r n and s n be positive sequences, then r n = Ω(s n ) if s n = o(r n ). On the other hand, r n = Θ(s n ) if both s n = O(r n ) and r n = O(s n ) hold. Note that

We first argue that P M < M → 0, which implies at least one change point hasn't been detected, then we can write

Then, by the validity of the bootstrap procedure, when b log n n = o(∆ n ), the power of the test goes to 1 as n → ∞ which implies that for any i,

This conclude that P M < M → 0.

Next we argue that P M > M → α. Note thatM > M implies there is a setT without any change point in it, however, sup x∈T |t n (x)| ≥ C n,α . Note that by our algorithm, we can considerT to be the largest set constructed by ruling out M intervals from [l, u] such that each interval has length 2b and contains one change point. Then since M is a fixed constant and b → 0, we have |T | = (|u − l| − 2M b) + → |u − l|. Then we can apply our main result Theorem 3.2 again onT to get that P sup x∈T |t n (x)| ≥ C n,α → α, which implies P M > M → α. Therefore, we have P M = M → 1 − α. Then it suffices to show P max 1≤i≤M |x i − x i | < c n |M = M → 1.

Since M is finite, we only need to focus on one change point. Let x 0 be any of the true change point andx be its estimate, it suffices to show P |x − x 0 | ≥ c n |M = M → 0. Without loss of generality, we assumê

x − x 0 =ĉ n = o P (b) and t n (x 0 ) > 0. The case t n (x 0 ) < 0 can be shown using similar arguments. Now we follow similar arguments as in Müller (1992) . Define ζ(c) := t n (x 0 + c) − t n (x 0 ), for c = o(b). Then we can writê c n = arg max ζ(c). Therefore, it suffices to showĉ n = O P 1 ∆n b log n n .

Suppose b is small enough such that the b-neighborhood of x 0 does not include any other change points, then we apply the previous decomposition in Eq. (41). Note that since x 0 is a change point, without loss of generality, we assume µ(x) is left continuous at x = x 0 , then the following term has the order of Θ(∆ n ):

Furthermore, using s 0 K(x)dx = Θ(s 2 ) because ofK (0) > 0, considering cases |X k−1 − x 0 | ∈ [0, c] and |X k−1 − x 0 | ∈ (c, b] separately, we have

Therefore, we have

Then using log n nb = o(∆ n ) we can conclude that

Recall that the estimated change pointx = x 0 +ĉ n , whereĉ n = arg max ζ(c), then we haveĉ

whenever b 4 = o((log n)/(n∆ n )) and b 3 = o((log n)/(n∆ 2 n )). This is always true since we have assumed δ 2 ≤ 1/4 which implies b = O(n −1/4 ) so b 4 = O(1/n) = o((log n)/n). Therefore, if we choose c n > 0 such thatĉ n = o(c n ), then we have P(|ĉ n | < c n ) → 0, which implies P |x − x 0 | ≥ c n |M = M → 0.

8. Additional proofs 8.1. Proof of Remark 3.1. For σ 2 n (x), we first write it as the sum of three terms:

For the first term, we first approximate 2 k by {E[ 2 k | ξ k,k−m ]} where m = n τ with τ > 0 small enough. Using the same argument as in Section 7.1, we

where we choose m = c log n with c > − 1 2 log(ρ) . We then divide 1, . . . , m into n/m + 1 blocks indexed by 1, · · · , n/m + 1. Then it's clear that the sum of blocks with odd indices is independent with the sum of blocks with even indices. Following the same argument as the proof of (Liu & Wu 2010, Theorem 2.5) for each subsequence of the blocks, and use a union bound, we can get 

Then, again choosing m = c log n and divide 1, · · · , n into n/m + 1 blocks, by the same argument as in (Zhao & Wu 2008 , pp. 1875 , we can get

= O P (log n) 2 log n nh 5/2 + ρ m = O P (log n) 3 nh 5/2 + (log n) 2 √ n

Finally, for the last term, we have

Then, using 0 < δ 1 < 1/4 we have that sup x σ 2 n (x) − σ 2 (x) = O P h 2 log n + (log n) 2 √ nh + (log n) 3 nh 5/2

= O P h 2 log n + (log n) 3 √ nh (138) For f n (x), similarly, by the same arguments as the proof for σ 2 n (x), following the proof of (Liu & Wu 2010, Lemma 4 .4), we can obtain sup x |f n (x) − f (x)| = O P (log n) 3 √ nh + h 2 log n . 8.2. Proof of Proposition 4.1. Since {U k } n k=0 are i.i.d. standard Gaussian distributed random variables, the proof for this proposition is simpler than Theorem 3.2. We can immediately prove the convergence to Gumbel distribution by using (Rosenblatt 1976 , Theorem 1).

On some global measures of the deviations of density function estimates

Risk, jumps, and diversification

Is the short rate drift actually nonlinear?

Multiple regimes and cross-country growth behaviour

Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation

Nonparametric estimation of functions with jump discontinuities

Nonlinear time series: nonparametric and parametric methods

A re-examination of diffusion estimators with applications to financial model validation

On tail probabilities for martingales

On the estimation of jump points in smooth curves

Jump-preserving regression and smoothing using local linear fitting: a compromise

Modelling nonlinear random vibrations using an amplitude-dependent autoregressive time series model

Testing for jumps when asset prices are observed with noise-a "swap variance" approach

A new approach to linear filtering and prediction problems

Jumps in equilibrium prices and market microstructure noise

Simultaneous nonparametric inference of time series

NTS: An R package for nonlinear time series analysis

Output fluctuations in the united states: What has changed since the early 1980's?

Change-points in nonparametric regression analysis

Two-stage change-point estimators in smooth regression models

Estimating the mean and covariance structure nonparametrically when the data are curves

On the maximal deviation of k-dimensional density estimates

Asymptotic spectral theory for nonlinear time series

Time series analysis and its applications

A nonparametric model of term structure dynamics and the market price of interest rate risk

Some advances in non-linear and adaptive modelling in time-series

Nonlinear time series analysis: A dynamics approach

Threshold autoregression, limit cycles and cyclical data (with discussion)

Testing and modeling threshold autoregressive processes

Jump and sharp cusp detection by wavelets

Nonlinear system theory: another look at dependence

On the gaussian approximation of convolutions under multidimensional analogues of sn bernstein's inequality conditions

Testing for jumps in the presence of smooth changes in trends of nonstationary time series

Nonparametric model validations for hidden markov models with applications in financial econometrics

Kernel quantile regression for nonlinear stochastic models

Confidence bands in nonparametric time series regression

Measuring nonlinear dependence in time-series, a distance correlation approach

Acknowledgments. The authors are grateful to the editor, Prof. Serena

as a sum of three termsNote thatζ k is uncorrelated with the second term of the right hand side of Eq. (64). Next, we show that under our assumptions on physical dependence measure, the first term of the right hand side of Eq. (64) becomes very small for large m. In order to rigorously prove this fact, definingwe first approximate n k=1 Z k (x) by the skeleton process n k=1 Z k (x j ), 1 ≤ j ≤ q n , where q n = n 2 /b and x j = j/(bq n ). Following the same arguments as in (Liu & Wu 2010 , Proof of Lemma 4.2) using Freedman's inequality for martingale differences Freedman (1975) , we have supNext, we show sup x∈T E|Z k (x)| exponentially decays with m. We consider twowhich finishes the proof of Eq. (62). Observing that, since K(·) is supported on [0, 1], one of the following two terms must be zero:Hence, defining M * n (x) similarly as M n (x) using K * (·) instead of K(·), by Eq. (62), we only need to focus on the following termClearly, in order to complete the proof of Theorem 3.2, it suffices to show7.1.4. Asymptomatic covariance structure. Next, we prove some results on the asymptomatic covariance structure of {M n (x)} which will be needed later for Gaussian approximation using the results in Bickel & Rosenblatt (1973) . Define the following quantities: r(s) := K(x)K(x + s)dx/λ K , Finally, by the assumptions on K in Theorem 3.5, we can follow the same arguments in the proof of Theorem 3.2 as the m-dependent approximation Section 7.1.2 and alternating big/small blocks Section 7.1.3 applying toK instead ofK to getFurthermore, using the fact that |K (u)| is uniformly bounded and mean value theorem, we haveNext, we define a new kernelǨ such thaťThen we can approximate the following term using the same arguments of m-dependent approximation and alternating big/small blocks as in Section 7.1.2 and Section 7.1.3 in the proof of Theorem 3.2 applying to this new kernelǨ to get