key: cord-0484869-r6csv0tm
authors: Mehrizi, Reza V.; Chenouri, Shojaeddin
title: Detection of Change Points in Piecewise Polynomial Signals Using Trend Filtering
date: 2020-09-18
journal: nan
DOI: nan
sha: 4e2cac8c8d6c22f5310b7f298738a6ce0a0ddd01
doc_id: 484869
cord_uid: r6csv0tm

While many approaches have been proposed for discovering abrupt changes in piecewise constant signals, few methods are available to capture these changes in piecewise polynomial signals. In this paper, we propose a change point detection method, PRUTF, based on trend filtering. By providing a comprehensive dual solution path for trend filtering, PRUTF allows us to discover change points of the underlying signal for either a given value of the regularization parameter or a specific number of steps of the algorithm. We demonstrate that the dual solution path constitutes a Gaussian bridge process that enables us to derive an exact and efficient stopping rule for terminating the search algorithm. We also prove that the estimates produced by this algorithm are asymptotically consistent in pattern recovery. This result holds even in the case of staircases (consecutive change points of the same sign) in the signal. Finally, we investigate the performance of our proposed method for various signals and then compare its performance against some state-of-the-art methods in the context of change point detection. We apply our method to three real-world datasets including the UK House Price Index (HPI), the GISS surface Temperature Analysis (GISTEMP) and the Coronavirus disease (COVID-19) pandemic.

The problem of change point detection is more than sixty years old. It was first studied by Page (1954 Page ( , 1955 , and since then, has been of interest to many scientists including statisticians.

Many of the earlier developments concerned the existence of at most one change point; however, considerable attention in recent years has been given to multiple change point analysis, which has found applications in many fields such as finance and econometrics, Bai and Perron (2003) ; Hansen (2001) , bioinformatics and genomics, Futschik et al. (2014) ; Lavielle (2005) , climatology, Liu et al. (2010) ; Pezzatti et al. (2013) , and technology, Siris and Papagalou (2004) ; Oudre et al. (2011); Lung-Yut-Fong et al. (2012) ; Ranganathan (2012) ; Galceran et al. (2017) . Consequently, there is a vast and rich literature on the subject. In the following, we only review a body of literature on a retrospective change point framework closely related to our work and refer the arXiv:2009.08573v2 [stat.ME] 30 Jul 2021 interested readers to Eckley et al. (2011) ; Lee (2010) ; Horváth and Rice (2014) ; Truong et al. (2018) for more comprehensive reviews.

We consider the univariate signal plus noise model

where f i = f (i/n) is a deterministic and unknown signal with equally spaced input points over the interval [0, 1] . The error terms ε 1 , . . . , ε n are assumed to be independently and identically distributed Gaussian random variables with mean zero and finite variance σ 2 . We assume that f (·) undergoes J 0 unknown and distinct changes at point fractions 0 = ω 0 < ω 1 < . . . < ω J 0 < ω J 0 +1 = 1, where the number of change point fractions, J 0 can grow with the sample size n. Additionally, we assume that f (·) is a piecewise polynomial function of order r ∈ N.

These assumptions imply that, associated with ω 0 , . . . , ω J 0 +1 , there are change points locations 0 = τ 0 < τ 1 < . . . < τ J 0 < τ J 0 +1 = n, which partition the entire signal f = (f 1 , . . . , f n ) into The canonical multiple change point problem, where the signal f is modelled as a piecewise constant function, has been extensively studied in the literature. In this framework, there are many approaches and we only attempt to list a selection of them here. The majority of these techniques seek to identify all change points at once by solving an optimization problem consisting of a loss function, often the negative log-likelihood, and a penalty criterion. Yao (1988) ; Yao and Au (1989) used the square error loss along with the Schwarz Information Criterion (SIC) as a penalty function to consistently estimate the bounded number of change points and their locations for the data drawn from a Gaussian distribution. Within the same setting, incorporation of various penalty functions including Modified Information Criterion (MIC) Pan and Chen (2006) , modified Bayesian Information Criterion (mBIC) Zhang and Siegmund (2007) , Simultaneous Information Theoretic Criterion (SITC) Wu (2008) and modified SIC Ciuperca (2011 , have been studied. Specific algorithms such as Optimal Partitioning Auger and Lawrence (1989) , Segment Neighbourhood Jackson et al. (2005) , and pruning approaches such as PELT Killick et al. (2012) and PDPa Rigaill (2015) are developed to solve such optimization problems.

Apart from penalty-based techniques, another frequently used class of change point detection approaches encompasses the greedy search procedures in which they search sequentially for one single change point at a time. The most popular methods in this class are Binary Segmentation Vostrikova (1981) and its variants such as Circular Binary Segmentation (CBS) Olshen et al. (2004) , and Wild Binary Segmentation (WBS) Fryzlewicz et al. (2014) . In recent years, researchers have attempted to improve Binary Segmentation's performance from statisti-cal and computational viewpoints. suggested a backward (bottom-up) mechanism, called Tail Greedy Unbalanced Haar (TGUH), which is computationally fast and statistically consistent in estimating both the number and the locations of change points. Also, introduced Wild Binary Segmentation 2 (WBS2) to deal with the shortcomings of WBS in datasets with frequent changes. It has been shown that the method is fast in run time and accurate in detection.

Beyond the canonical change point problem, signals in which f is modelled as a piecewise polynomial of order r ≥ 1 have attracted less attention in the literature despite many applications. For instance, piecewise linear signals are applied in monitoring patient health (Aminikhanghahi and Cook (2017) , Stasinopoulos and Rigby (1992) ), climate change (Robbins et al. (2011)) , and finance (Bianchi et al. (1999) ). In such a framework, Bai (1997) introduced a method based on Wald-type sequential tests, and Maidstone et al. (2017) devised a dynamic programming applied to an 0 -penalized least square procedure. In continuous piecewise linear models, Kim et al. (2009) developed a methodology called 1 -trend filtering. Furthermore, Baranowski et al. (2019) put forward the method of Narrowest Over Threshold (NOT), and Anastasiou and Fryzlewicz (2019) developed an approach called Isolate-Detect (ID) which both provide asymptotically consistent estimators of the number and locations of change points.

Our goal in this paper is to introduce a unifying method covering the canonical change point problem and beyond. More precisely, the method is cable of detecting change points in piecewise polynomial signals of order r (r = 0, 1, 2, . . .) with and without continuity constraint at the locations of change points.

The detection of change points in a sequence of data can be formulated as a penalized regression fitting problem. According to our notation, the quantity f τ − f τ +1 is nonzero if the signal f undergoes a change at point τ , and is zero otherwise. Moreover, if we assume that change points are sparse, that is, the number of locations where f changes, J 0 , is much smaller than the number of observations n, change points can be estimated using the one-dimensional fused lasso problem

where f = (f 1 , . . . , f n ).

This formulation of the canonical change point problem was first considered in Huang et al. (2005) and was applied to analyze a DNA copy number dataset. Harchaoui and Lévy-Leduc (2010) considered the same formulation and proved the consistency of the respective change point estimates when the number of change points is bounded. Employing sparse fused lasso which is composed of both the 1 -norm and the total variation seminorm penalties, Rinaldo et al. (2009) proposed a sparse piecewise constant fit and established the consistency of the corresponding estimates when the variance of the noise terms vanishes, and the minimum magnitude of jumps is bounded from below. However, Rojas and Wahlberg (2014) argued that the consistency results achieved by Rinaldo et al. (2009) are incorrect when a frequently viewed pattern, called staircase, exists in the signal. The staircase phenomenon occurs in a piecewise constant model when there are either two consecutive downward jumps or upward jumps in its mean structure. The staircase pattern will be discussed in more detail in Section 6. Additionally, Qian and Jia (2016) showed that the lasso problem of Tibshirani (1996) when derived by transforming fused lasso does not satisfy the Irrepresentable Condition (Zhao and Yu (2006) ) that is necessary and sufficient for exact pattern recovery. In particular, Qian and Jia (2016) proposed an approach called preconditioned fused lasso based on the puffer transformation of Jia et al. (2015) and established that it can recover the exact pattern with probability approaching one.

A similar approach to that of the piecewise constant signals can be considered for estimating change points in piecewise polynomial signals. In particular, a positive integer τ is a change location in an r-th degree piecewise polynomial signal f if τ -th element of the vector

is a penalty matrix that will be defined in Section 2.2. Hence, change points can be estimated from nonzero elements of

The aforementioned problem was first studied by Steidl et al. (2006) in the context of image processing and was called higher order total variation regularization. It was later rediscovered by Kim et al. (2009) and termed trend filtering in the nonparametric regression setting. Kim et al. (2009) specifically explored linear trend filtering (r = 1) which fits piecewise linear models. Tibshirani et al. (2014) extensively studied trend filtering and compared its performance with smoothing splines Green and Silverman (1993) and locally adaptive regression splines Mammen et al. (1997) in the context of nonparametric regression. Tibshirani et al. (2014) also established that trend filtering enjoys desirable and strong theoretical properties of locally adaptive regression splines while being computationally less intensive due to its banded penalty matrix.

Moreover, trend filtering has an adaptive knot selection property, which makes it well suited for change point analysis.

From a computational and algorithmic standpoint, Kim et al. (2009) 

Interior Point (PDIP) method for deriving the estimates of the linear trend filtering problem at a fixed value of λ. This can be easily carried over to the trend filtering problem of any order. Wang et al. (2014) suggested an algorithm based on a falling factorial basis while Ramdas and Tibshirani (2016) derived an algorithm based on the Alternating Direction Method of Multipliers (ADMM) discussed in Boyd et al. (2011) . The computational complexity of all these algorithms is of order O(n).

In this paper, we develop a new methodology called Pattern Recovery Using Trend Filtering (PRUTF) for identifying unknown change points in piecewise polynomial signals with no continuity restriction at change point locations. Therefore, a change point is defined as a sudden jump in the signal and its all derivatives up to order r. Figure 2 displays such change points for various r. In this paper, we make the following contributions.

• We propose a generic dual solution path algorithm along with the regularization parameter for trend filtering. This solution path, whose basic idea is borrowed from Tibshirani (2011) enables us to determine change points at each level of the regularization parameter. Our algorithm, PRUTF, is different from that of Tibshirani (2011) as we remove (r + 1)

coordinates of dual variables after identifying each change point. This adjustment to the algorithm allows us to have independent dual variables between each pair of neighbouring change points. Besides, the elimination of (r + 1) coordinates at each step leads to faster implementation of the algorithm.

• We establish a stopping criterion that plays an essential rule in the PRUTF algorithm used to find change points. Notably, we show that the dual variables of trend filtering between consecutive change points constitute a Gaussian bridge process. This finding allows us to introduce a threshold for terminating our proposed algorithm.

• If the signal contains a staircase pattern, we prove that the method is statistically inconsistent, which makes it unfavourable. Explaining the reason for this end, we modify the PRUTF algorithm to produce estimates consistent in terms of both the number and location of change points.

This paper is organized as follows: we first describe how to characterize the dual optimization problem of trend filtering. In Section 3, we develop our main algorithm, PRUTF, to use in constructing the dual solution path of trend filtering and, in turn, identifying the locations of change points. Section 4 discusses the properties of this dual solution path. We establish that the dual variables derived from the solution path form a Gaussian bridge process that makes them favourable for statistical inference. Applying these properties, we develop a stopping rule for the change point search algorithm in Section 5. The quality of the PRUTF algorithm is validated in terms of pattern recovery of the true signal in Section 6. It is established that the proposed technique in its naive form fails to consistently identify the true signal when a special pattern, called staircase, is present in the signal. Section 7 elaborates on how to modify the algorithm in order to estimate the true pattern consistently. Simulation results, and real-world applications are presented in Section 8. We explore the performance of our proposed method for signals with frequent change points as well as models with dependent error terms in Section 9. We conclude the paper with a discussion in Section 10.

We begin this section with setting up notations that will be used throughout this article.

For an m × n matrix A, we denote its rows by A 1 , . . . , A m and express the matrix as A = ) T represents the submatrix of A whose row labels are in the set I. In a similar manner, for a vector a of length m, we let a I = (a i 1 , . . . , a i k ) T denote a subvector of a whose coordinate labels are in I. We write A −I and a −I to denote A {1, ..., m}\I and a {1, ..., m}\I , respectively, where J \I is the set of indices in J but not in I. Furthermore, for selecting i-th row of A, the notation [A] i and for its (i, j)-th element the notation [A] ij are used. Also,

[a] i extracts the i-th elements of the vector a. We write diag(A) to denote the vector of the main diagonal entries of the matrix A. Moreover, for a real number x, x denotes the greatest integer less than or equal x, and x denotes the least integer greater or equal x. For a set A, the indicator function is denoted by 1(A).

Recall the trend filtering problem

where λ ≥ 0 is the regularization parameter for controlling the effect of smoothing, and the (n − r − 1) × n penalty matrix D (r+1) is the difference operator of order (r + 1). For r = 0, the first order difference matrix D (1) is defined as

and for r ≥ 1, the difference operator of order r + 1 can be recursively computed by D (r+1) = D (1) ×D (r) . Notice that, in this matrix multiplication, we consider only the submatrix consisting of the first n − r − 2 rows and n − r − 1 columns of the matrix D (1) . Figure 1 displays the trend filtering fits for r = 0, 1, 2 for simulated data. Although the objective function in the r-th order trend filtering (3) is strictly convex and thus the minimization has a guaranteed unique solution, the penalty term is not differentiable in f , so solving the optimization in its current form is difficult. To overcome this difficulty, we follow the argument in Tibshirani (2011) and convert this optimization problem into its dual form. Since the objective function in the primal problem is strictly convex with no constraint, the strong duality holds, meaning that the primal and the dual solutions coincide.

The trend filtering problem (3) can be rewritten as

where, for ease in the notation, we use D = D (r+1) . For any given λ > 0, the Lagrangian is

and, thus the dual function is given by it to be equal to zero, we obtain

Now substituting this back into the Lagrangian L(f , z, u), and performing certain algebraic manipulations, we obtain

Minimizing L * (z, u), or equivalently maximizing u T z − λ z 1 , with respect to z ∈ R m leads us to the dual function g(u). Notice that sup z {u T z − λ z 1 } is the conjugate of the function λ z 1 in the context of conjugate convex functions. See Brezis (2010) and Boyd and Vandenberghe (2004) . This conjugate function is given by

From all these, the dual function is given as

and, thus the dual problem is to find the maximum of the dual function g(u). This is equivalent to min u∈R m

The constraint in (5) is an ∞ -ball or a hypercube centered at the origin with the boundaries given by the set {−λ, +λ} m . Since the matrix D is full row rank, the problem (5) is strictly convex and has a unique solution, see Ali et al. (2019) . In addition, notice that the dimension of the dual vector u is m, which is smaller than that of the primal vector f and may lead to relatively faster computations. The connection between the primal and the dual solutions is given by the equations

where γ ∈ R m is a subgradient of x 1 computed at x = D f λ . This subgradient is given by

The statements in Equations (6)-(8) are equivalent to the KKT optimality conditions of the primal problem (3). The dual problem (5) demonstrates that D T u λ is the projection, P C (y), of y onto the convex polyhedron (or hypercube here) C = {x ∈ R m : x ∞ ≤ λ} . From this, the primal solution (7) can be rewritten in the form of (I − P C ) (y), representing the residual projection map of y onto the polyhedron C.

Our idea of applying trend filtering to discover change points in piecewise polynomial signals is inspired by Rinaldo et al. (2009) and its correction Rinaldo (2014) , in which change point detection is studied using fused lasso. Besides extending to piecewise polynomial signals, the novelty of our work is in providing an exact stopping criterion, which is based on the Gaussian bridge property of the trend filtering dual variables. In addition, we propose an algorithm which, unlike that proposed in Rinaldo et al. (2009) , always produces consistent change points even in the presence of staircase patterns.

In this section, we construct and study the solution path of dual variables u λ as the regularization parameter decreases from λ = ∞ to λ = 0. In the following, the PRUTF algorithm is given to compute the entire dual solution path. This dual solution path identifies the corresponding primal solution using (7). For any given λ, we call any coordinate of u λ a boundary coordinate if it is a vertex of the polyhedron C = x ∈ R m : x ∞ ≤ λ , meaning that its absolute value becomes λ. In the process of constructing the solution path, for any λ, we trace several sets, introduced below.

• The set B = B(λ), called the boundary set, contains the boundary coordinates identified by u λ .

• The vector s B = s B (λ), called the sign vector, represents collectively the signs of the boundary points in B(λ).

• The set A = A(λ), called the augmented boundary set, contains the boundary coordinates in B(λ) as well as the first r a = (r + 1)/2 coordinates immediately after.

• The vector s A = s A (λ) represents collectively the signs of the augmented boundary points in A(λ).

In the following, we discuss the need for the augmented boundary set A. We begin by studying the structure of the dual vector u = Df in a piecewise polynomial signal of order r,

where the signal is partitioned into a number of blocks defined by the position of the change points. Because the signal f is a piecewise polynomial of order r, to compute the i-th coordinate of the vector u, we need r b = (r + 1)/2 − 1 points directly before the i-th element of f as well as r a = (r + 1)/2 points immediately after that. Consequently, the first r a elements of Df within each block cannot be computed. Moreover, within each block, the last r b + 1 elements of Df are all nonzero due to the existence of a change point. This observation is depicted in At the j-th iteration with λ = λ j , we assume that the boundary set and its corresponding sign vector are B = B(λ) and s B = s B (λ), respectively. Furthermore, we assume the augmented boundary set and its sign vector are A = A(λ) and s A = s A (λ), respectively. Dual coordinates can be split into augmented boundary coordinates u λ j , A and interior coordinates u λ j , −A . Recall from Section 2.1 that u λ j , A represents the subvector of u λ with the coordinate labels in the set A and u λ j , −A represents the subvector of u λ with the coordinate labels in the set {1, 2, · · · , m}\A.

It is apparent from the definition of the boundary coordinates that

Replacing the boundary coordinate with λ j s A in (5) and solving the resulting quadratic problem with respect to the interior coordinates, lead to their least square estimates, given by

It should be noted that for the purpose of simplicity, we denote (

the least square estimate of regressing the response vector y on the design matrix D −A . The

can be interpreted as a shrinkage term due to the condition u ∞ ≤ λ. The expression (10) is true for λ ≤ λ j until either an interior coordinate joins to the boundary or a coordinate in the boundary set leaves the boundary. The following argument explains how to specify values of λ while the interior coordinates change.

We define the joining time associated with the interior coordinate i ∈ {1, 2, · · · , m}\A as the time at which this interior coordinate joins the boundary. To determine the next join- The olive lines display the true signals with two change points at the locations 6 and 13. Empty circles represent the indices that Df does not exist.

ing time, we reduce the value of λ in a linear direction starting from λ j and solve u λ,−A = (±λ, · · · , ±λ) T . Note that the right-hand side of (10) can be expressed as

The joining time for every i ∈ {1, 2, · · · , m}\A is hence the solution of the equation a i − λ b i = ±λ with respect to λ, which is given by

Note that λ which in turn, along with Equation (7), implies diag(s B ) D y − D T u λ B > 0. Here, for any vector η, diag(η) denotes the diagonal matrix with the diagonal elements given by η, and η > 0 holds element-wise. Therefore, a coordinate i ∈ B leaves the boundary set B if

where

Hence, a leaving time is obtained from the equation

The conditions in the aforementioned equation is due to the fact that at the j-th iteration with

if both c i and d i are negative. An alternative way to determine the next leaving time is to use the KKT optimality conditions of (5). We refer the reader to the supplementary materials of Tibshirani (2011).

The following algorithm, PRUTF, describes the process of constructing the entire dual solution path of trend filtering.

Algorithm 1 (PRUTF Algorithm)

1. Initialize the set of change points locations as τ 0 = ∅, the empty set.

2. At step j = 1, initialize the boundary set B 1 = τ 1 −r b , τ 1 −r b +1, . . . , τ 1 and its associated sign vector s B 1 = {s 1 , . . . , s 1 }, both with cardinality of r b + 1, where τ 1 is obtained by

and s 1 = sign( u τ 1 ), where u i is the i-th element of the vector u = DD T −1 D y. The updated set of change points locations is now τ 1 = {τ 1 }. We also record the first joining time λ 1 =| u τ 1 | and keep track of the augmented boundary set A 1 = {τ 1 − r b , . . . , τ 1 + r a } and its corresponding sign vector s A 1 = {s 1 , . . . , s 1 } of length r + 1. The dual solution is regarded as u(λ) = DD T −1 D y, for λ ≥ λ 1 .

3. For step j = 2, 3, . . . , 

and set the next joining time λ join j as the value of

and assign the next leaving time λ 

The critical points λ 1 ≥ λ 2 ≥ . . . ≥ 0 indicate the values of the regularization parameter at which the boundary set changes.

Remark 1 Notice that the vector τ derived by the PRUTF algorithm represents the locations of change points for the dual variables. In order to obtain the locations of change points in primal variables, we must add r a to any element of τ , that is, τ 1 + r a , τ 2 + r a , . . . . This relationship between the primal and dual change point sets is visible from Figure 2 .

Remark 2 For fused lasso, r = 0, Lemma 1 of Tibshirani (2011) , known as the boundary lemma, is satisfied since the matrix DD T is diagonally dominant, meaning that DD T i,i ≥ j =i DD T i,j , for i = 1, . . . , m. This lemma states that when a coordinate joins the boundary, it will stay on the boundary for the rest of the path. Consequently, part (b) of step 3 in Algorithm 1 is unnecessary, and hence the next leaving time in part (c) is set to zero, i.e., λ leave j = 0, for every step j. However, the boundary lemma is not satisfied for r = 1, 2, 3, . . ..

There is a subtle and important distinction between our proposed algorithm, PRUTF, and the one presented in Tibshirani (2011) . The latter work studies the generalized lasso problem for any arbitrary penalty matrix D (unlike D used in trend filtering, which must have a certain structure). The proposed algorithm in Tibshirani (2011) relies on adding or removing only one coordinate to or from the boundary set at every step. The key attribute of our algorithm is to add or remove r + 1 coordinates to or from the augmented boundary set, an approach inspired by the argument presented at the beginning of this section. Essentially, this attribute makes PRUTF, presented in Algorithm 1, well-suited for change point analysis. It is important to mention that PRUTF requires at least r + 1 data points between neighbouring change points.

Remark 4 For a given λ, equations (9) and (10) give the values of the dual variables in u λ .

The equations demonstrate that the dual solution path is a linear function of λ with change in the slopes at joining or leaving times λ 1 ≥ λ 2 ≥ . . . ≥ 0.

Remark 5 The number of iterations required for PRUTF, presented in Algorithm 1, is at most Tibshirani et al. (2013) , Lemma 6. However, this upper bound for the number of iterations is usually very loose. The upper bound comes from the following realization discovered by Osborne et al. (2000) and later improved by Mairal and Yu (2012) . Any pair A , s A appears at most once throughout the solution path. In other words, if A , s A is visited in one iteration of the algorithm, the pair A , −s A as well as A , s A cannot reappear again for the rest of the algorithm. Interestingly, this fact says that once a coordinate enters the boundary set, it cannot immediately leave the boundary set at the next step.

Moreover, note that at one iteration of the PRUTF algorithm with the augmented boundary set A, the dominant computation is in solving the least square problem

One can apply QR decomposition of D T A to solve the least square problem, and then update the decomposition as set A changes. However, since D −A D T −A is a banded Toeplitz matrix (see Section 4), a solution of (17) always exists and can be computed using a banded Cholesky decomposition. Thus, the computational complexity for the iteration is of order O (m−|A|) r 2 , which is linear in the number of interior coordinates as r is fixed and usually small. Overall, if K is the total number of steps run by the PRUTF algorithm, then the total computational Tibshirani (2011) and Arnold and Tibshirani (2016) . that partition all the dual variables into J +1 blocks B j = τ j +1, . . . , τ j+1 for j = 0, 1, . . . , J,

with the conventions that τ 0 = 0 and τ J+1 = m. In the following, we list some properties of matrix multiplications involving D.

• It follows from the definition of the matrix D that it is a banded Toeplitz matrix with bandwidth r + 1. It tuns out that the matrix DD T reveals the same property, meaning that it is a square banded Toeplitz matrix. Moreover, its r + 1 nonzero row elements are consecutive binomial coefficients of order 2 r + 2 with alternating signs. In other words,

An example, for r = 1, is given in panel (a) of Figure 3 . Note that DD T is a symmetric, nonsingular and positive definite matrix (Demetriou and Lipitakis, 2001) .

• The matrix D −A D T −A is a block diagonal matrix whose diagonal submatrices correspond to J + 1 blocks. More precisely, the j-th submatrix on the diagonal of D −A D T −A is a matrix with the first (τ j+1 − τ j − r) rows and columns of DD T , see panel (b) of Figure   3 . Notice that, due to its non-singularity,

• Another interesting term in analyzing the behaviour of the dual variables is D T A s A . It can be shown that the vector D T A s A can be partitioned into J + 1 subvectors associated with the change points τ j , j = 1, . . . , J. The subvector associated with τ j , j = 2, . . . , J − 1, is D T A j s A j , whose elements are zero, except the first consecutive r + 1 as well as the last consecutive r + 1 elements. The first r + 1 nonzero elements of D T A j s A j are the binomial coefficients in the expansion of s j (x − 1) r , and its last r + 1 elements are the binomial coefficients in the expansion of −s j+1 (x − 1) r . Furthermore, the first r + 1 elements of the first subvector and the last r + 1 elements of the last subvector are also equal to zero. For example, for a piecewise cubic signal, r = 3, with two change points τ 1 , τ 2 and signs

tionally, if the signs of two consecutive change points τ j and τ j+1 are the same, then

Moreover, let X j be the design matrix of the r-th polynomial regression on the indices of j-th segment τ j + 1 , . . . , τ j+1 , that is,

Therefore, the orthogonal projection matrix I − P D is a block diagonal matrix whose j-th block associated with the segment τ j + 1 , . . . , τ j+1 is equal to the projection map onto the column space of X j , i.e.,

Equation (9) says that the absolute values of the boundary coordinates are λ, that is,

On the other hand, the values of the interior coordinates are given by

For a given λ, the dual variables u(t; λ) for t = 0, . . . , m can be collectively viewed as a random bridge, that is, a conditioned random walk with drift whose end points are set to zero.

Moreover, u(t; λ) is bounded between −λ and λ. The quantity u(t; λ) can also be decomposed into a sum of several smaller random bridges which are formed by blocks created from the change points. Recall that the last consecutive r b + 1 elements of the block B j are λ s j , for any j = 0, 1, · · · , J. Hence, for t = τ j + r a , . . . , τ j+1 − r b , the random bridge associated with the j-th block is given by

with the conventions s 0 = s J+1 = 0 ∈ R r+1 . It is important to note that similar to u(t; λ), the process u j (t; λ) satisfies the conditions u j (τ j + r a ; λ) = λ s j and u j (τ j+1 − r b ; λ) = λ s j+1 . From (22), the process u j (t; λ) is composed of the stochastic term

and the drift term

According to model (1) with Gaussian noises, it turns out that the discrete time stochastic process term u st j (t) can be embedded in a continuous time Gaussian bridge process. The following theorem describes the characteristics of this process.

Theorem 1 Suppose the observation vector y is drawn from the model (1), where the error vector ε has a Gaussian distribution with mean zero and covariance matrix σ 2 I. For given D and A,

and, for j = 0 , . . . , J. Then the stochastic process W j = W j (t) : (τ j + r a )/m ≤ t ≤ (τ j+1 −r b )/m is a Gaussian bridge process with mean vector zero and covariance function

The processes W j and W j are independent, for j = j.

A proof is given in Appendix A.1.

This theorem could be extended to the case of non-Gaussian random variables and therefore establishes a Donsker type Central Limit Theorem for W j . Theorem 1 guarantees that the dual variable process associated with the j-th block, i.e. u j = u mt ; λ : (τ j + r a )/m ≤ t ≤ (τ j+1 − r b )/m is a Gaussian bridge process with the drift term

and the covariance matrix stated in (27).

Recall that a standard Brownian bridge process defined on the interval [a, b] is a standard Brownian motion B(t) conditioned on the event B(a) = B(b) = 0. It is often characterized from a Brownian motion B(t) with B(a) = 0, by setting

The mean and covariance functions of the Brownian bridge B 0 (t) are given by E B 0 (t) = 0 and

A Gaussian bridge process is an extension of the Brownian bridge process when the Brownian motion B(t), in the definition of the Brownian bridge B 0 (t), is replaced by a more general Gaussian process G(t). See, for example, Gasbarra et al. (2007) .

The celebrated Donsker theorem Donsker (1951) states that the partial sum process of a sequence of i.i.d. random variables, with mean zero and variance 1, converges weakly to a Brownian bridge process. See Van Der Vaart and Wellner (1996) or Billingsley (2013) .

A version of Theorem 1 involving non-Gaussian random variables would extend this result to weighted partial sum processes and show that the limiting process is a Gaussian bridge with a certain covariance structure. So the Gaussian assumption in Theorem 1 is not restrictive. It is also interesting to show that for r = 0, 1, the process u st 

, . . . , 0 y.

Notice that the above statement is the CUSUM statistic for the j-th segment, that is

where y (τ j +1):τ j+1

is the sample average of y τ j +1 , . . . , y τ j+1 . It is well known that the CUSUM statistic (29) converges weakly to the Brownian bridge. In addition, for any

which is identical to the covariance function of the Brownian bridge.

• For the piecewise linear signals r = 1, the quantity

where f is the least square fit of the simple linear regression of y τ j +1 , . . . , y τ j+1 onto τ j + 1, . . . , τ j+1 . As proved in Theorem 1, the preceding statistic (30) is also a Gaussian bridge process. Furthermore, using the results in Hoskins and Ponzo (1972) , for any (τ j + r a )/m ≤ t ≤ t ≤ (τ j+1 − r b )/m, the covariance function of this Gaussian bridge process is given by

This section concerns developing a stopping criterion for the PRUTF algorithm. We provide in (23) with A = ∅. It turns out that u st (t) is a stochastic process with local minima and maxima attained at the change points. This structure is displayed with cyan-colored lines ( ) in Figure 4 for both piecewise constant r = 0 and piecewise linear r = 1 signals. As the PRUTF algorithm detects more change points and forms the augmented boundary set A, the local minima or maxima corresponding to these change points are removed from the stochastic process

for t = 1, . . . , m − |A|. This fact is shown by olive-colored lines ( ) in Figure 4 . The last equality in (31) expresses that the u st −A (t) is the stochastic term of the dual variables for all the interior coordinates and is derived by stacking the stochastic terms of the dual variables associated with j-th block, u st j (t), as defined in (23), for j = 0, . . . , J. This behaviour suggests a way to introduce a stopping rule for the PRUTF algorithm. As can be viewed from the orange-colored lines ( ) of Figure 4 , if all true change points are captured by the algorithm and stored in the augmented set A 0 , the resulting process contains no noticeable optimum points and tends to fluctuate close to the zero line (x-axis).

We terminate the search in Algorithm 1 at step j by checking whether the maximum of u st −A j (t) , for t = 0, . . . , m − |A j |, is smaller than a certain threshold. To exactly specify this threshold, as suggested by Theorem 1, we need to calculate the excursion probabilities of a Gaussian bridge process. As stated in Adler and Taylor (2009) , analytic formulas for the excursion probabilities are known to be available only for a small number of Gaussian processes.

One of such Gaussian processes is the Brownian bridge process. It is well known that for the

Pr sup

See, for example, Adler and Taylor (2009) , and Shorack and Wellner (2009) . Hence for the piecewise constant signals, the required threshold for stopping Algorithm 1 can be obtained from (32), for a suitably chosen interval [a, b] . That is, for a given value α, we choose x α such that Pr sup a ≤ t ≤ b | B 0 (t) | ≥ x α = 1 − α. Therefore, for r = 0 and a = 0, b = 1, we stop

For r ≥ 1, the threshold is obtained in a similar fashion. Although the excursion probabilities for the Gaussian bridge processes are not known, we notice that by adopting the steps for the proof of (32) in Beghin and Orsingher (1999) , we can establish a similar formula for the Gaussian bridge process G 0 (t) in Theorem 1 as Pr sup

where k = m − |A j 0 |, and the quantity S 2 r (k) is the k-th diagonal element of the matrix

where x α is derived from the equation

The main purpose of this section is to investigate whether the PRUTF algorithm can recover 

where m = n − r − 1 is the number of rows of matrix D. We use the notation f pr = f to briefly denote the pattern recovery feature of f .

Pr

where f = f n , to denote its dependency to the sample size n. Pattern recovery is very similar to the concept of sign recovery in lasso (Zhao and Yu, 2006; Wainwright, 2009 ) as it deals with the specification of both locations of non-zero coefficients and their signs.

The problem of pattern recovery is studied for the special case of the fused lasso in several papers. Rinaldo et al. (2009) derived conditions under which fused lasso consistently identifies the true pattern. This was contradicted by Rojas and Wahlberg (2014) , who argued that fused lasso does not always succeed in discovering the exact change points. Rojas and Wahlberg (2014) showed that fused lasso can be reformulated as the usual lasso, for which the necessary conditions for exact sign recovery have been established in the literature. Then, they proved that one such necessary condition, known as the irrepresentable condition, is not satisfied for the transformed lasso when there is a specific pattern called a staircase (Definition 2). Corrections to Rinaldo et al. (2009) were appeared in Rinaldo (2014) . Later on, Qian and Jia (2016) proposed a method called puffer transformation, which is shown to be consistent in specifying the exact change points, including in the presence of staircases.

In the remaining part of this section, we use the dual variables to demonstrate the situations in which PRUTF can correctly recover the pattern of the true signal. Exact pattern recovery implies that the dual variables are comprised of J 0 + 1 consecutive bounded processes whose endpoints correspond to the true change points. The following lemma describes the situations in which exact pattern recovery can be attained. A particular case of this result in the context of piecewise constant signals was established in Rinaldo (2014) .

Theorem 2 Exact pattern recovery in PRUTF occurs when the discrete time processes u st j (t) , t = τ j + r a , . . . , τ j+1 − r b , for j = 0, . . . , J 0 , satisfy the following conditions simultaneously with probability one:

(a) First block constraint: for t = 1 , . . . ,

(c) Interior Block constraints: for t = τ j + r a , . . . ,

and if s j = s j+1 , which corresponds to a staircase block, u st j (t) ≤ 0 or u st j (t) ≥ 0.

In the foregoing equations, 1 ∈ R r+1 is a vector of size r + 1 whose elements are all 1. A proof of the theorem is given in Appendix A.2.

We analyze the performance of the PRUTF algorithm in pattern recovery in two different scenarios; signals with staircase patterns and signals without staircase patterns. To our knowledge, Rojas and Wahlberg (2014) 

The following theorem investigates the consistency of PRUTF in pattern recovery, in both with and without staircases. Specifically, it shows that for a signal without a staircase, the exact pattern recovery conditions are satisfied with probability one. On the other hand, in the presence of staircases in the signal, the probability of these conditions holding, which is equivalent to the probability of a Gaussian bridge process never crossing the zero line, converges to zero.

In the literature, the consistency of a change point method is usually characterized by the signal size n, the number of change points J 0 , the noise variance σ 2 n , the minimal spacing between change points, L n = min j=0, ..., J 0 L n, j = min j=0, ..., J 0 τ j+1 − τ j , and the minimum magnitude of jumps between change points,

All the above quantities are allowed to change as n grows.

In the following, we present our main theorem providing conditions under which the output of the PRUTF algorithm consistently recovers the pattern of the true signal f . 

hold, then the PRUTF algorithm guarantees exact pattern recovery with probability approaching one. That is,

(b) Staircase Blocks: On the other hand, if the true signal f contains at least one staircase block, then the probability of exact pattern recovery by the PRUTF algorithm converges to zero. That is,

A proof is given in Appendix A.3.

The performance of PRUTF in terms of consistent pattern recovery relies on the quantity δ n L r+1/2 n /σ n and the choice of λ n . In the piecewise constant case, the former quantity reduces to the well-known signal-to-noise-ratio quantity, which is crucial for a consistent change point estimation (Fryzlewicz et al., 2014; Wang et al., 2020) . The statements in 42 illustrate that the consistency of PRUTF in non-staircase blocks is achievable if the quantity δ n L r+1/2 n /σ n is of order O(n r+c ), for some c > 0. In addition, the number of the change points J 0 is allowed to diverge, provided

The drift term (24) plays a key role in assessing the performance of PRUTF in pattern recovery. From (18), this drift for a staircase block B j becomes λ s j , which is constant in t for the entire block. Consequently, the interior dual variables u j (t; λ) for the staircase block According to Theorem 3, if there is no staircase pattern in the underlying signal, the PRUTF algorithm consistently estimates the true signal, and fails to do so, otherwise. Given the results in Theorem 3, the natural question is whether Algorithm 1 could be modified to enjoy the consistent pattern recovery in any case. In the next section, we will present an effective remedy based on altering the sign of a change associated with a staircase block.

In this section, we attempt to modify the PRUTF algorithm in such a way that it produces consistent estimates of the number and locations of change points even in the presence of staircase patterns. As previously mentioned, for a staircase block, the drift term (24) is constant and leads to false discoveries in change points. This is shown in Figure 6 

Therefore, a question arises: can we modify Algorithm 1 in such a way that the change signs of two neighbouring change points never become equal but still yield the solution path of trend filtering? We suggest a simple but very efficient solution to the above question.

Once a new change point is identified, we check whether its r-th order difference sign is the same as that of the change points right before and after. If these change signs are not identical, then the procedure continues to search for the next change point. Otherwise, we replace the sign of the neighbouring change point with zero. This replacement of the sign prevents the drift term (24) from becoming zero. This idea is implemented for the above signal, and the result is displayed in Figure 7 . As shown in panel (b), the sign of the first change point at location 50 is set to zero since its sign is identical to the sign of the second change point at 15. This sign replacement vanishes false discoveries appeared in panel (b) of Figure 6 . Based on the above argument, PRUTF presented in Algorithm 1 can be modified as follows to avoid false discovery and to produce consistent pattern recovery. Recall that the KKT optimality conditions for solutions of the trend filtering problem in (5) requires the dual variables u λ to be less than or equal to λ in absolute values, i.e., | u λ | ≤ λ.

This condition still holds when we replace the sign values (+1 or −1) with 0. Consequently, we have the following theorem.

Theorem 4 The mPRUTF algorithm presented in Algorithm 2 is a solution path of trend filtering.

For brevity, we do not provide the proof of Theorem 4 here. We refer the reader to the similar arguments for the LARS algorithm of lasso in Tibshirani et al. (2013) . It is worth pointing out that the mPRUTF algorithm requires slightly more computation than the original PRUTF algorithm. The increase in computation time directly depends on the number of staircase blocks in the underlying signal. To show how mPRUTF resolves the problem of false discovery in signals with staircases, we ran both algorithms for 1000 generated datasets from a piecewise constant and piecewise linear signals. The frequency plot of the estimated change points for both algorithms are represented in Figure 8 . The figure reveals that the original algorithm produces false discoveries within staircase blocks for both signals, whereas mPRUTF resolves this issue. 

In this section, we provide numerical studies to demonstrate the effectiveness and performance of our proposed algorithm, mPRUTF . We begin with a simulation study and then provide real data analyses.

In this section, we investigate the performance of our proposed method, mPRUTF, by a simulation study. We consider two scenarios, namely piecewise constant and piecewise linear signals with staircase patterns. We compare our method to some powerful state-of-the-art approaches in change point analysis. These methods, a list of their available packages on CRAN, and their applicability for different scenarios are listed in Table 1 . Table 1 : A list of change point detection and estimation methods with their packages in CRAN.

The last two columns indicate which methods can be applied to piecewise constant or/and piecewise linear signals.

We have adopted the simulation setting of Baranowski et al. (2019) , and consider piecewise constant and piecewise linear signals as follows. We apply mPRUTF presented in Algorithm 2 to estimate the number and the locations of the change points for the PWC and PWL signals. In each iteration of the simulation study, we simulate a dataset according to model (1) under the assumption that the error terms are independently and identically distributed as N (0 , σ 2 ). Moreover, we set the significance level to α = 0.05 for the stopping rule in (34).

In order to explore the impact of different noise levels on the change point methods, we run each simulation for various values of σ in 0.5, 1, 1.5, . . . , 4.5, 5 . We run the simulation N = 5000 times and report the results for each change point technique in terms of estimates of the Table 1 are set to the default values by the packages.

The results for the PWC and PWL signals are presented in Figures 10 and 11 , respectively. In the case of piecewise constant signal, as in Figure 10 , mPRUTF performs comparable to PELT and SMUCE in terms of the average number of change points, MSE and the scaled Hausdorff distance up to σ = 3, and outperforms them as σ increases. For σ ≥ 4, similar performance to WBS, NOT and ID is viewed from these measurements. As indicated by the average number of change points, MSE and the scaled Hausdorff distance, WBS, NOT and ID outperform the other methods in almost all noise levels. From a computational point of view, mPRUTF takes a slightly longer time, mainly due to the matrix D multiplications, however, this computation time decreases as noise level σ increases. As in panel (d) of Figure 10 , the methods PELT, SMUCE and ID are the fastest ones.

In the case of piecewise linear signal, mPRUTF is only compared to NOT and ID methods, which are applicable to the piecewise polynomials of order r ≥ 1. As in Figure 

In this section, we have analyzed UK HPI and GISTEMP and COVID-19 datasets, using our proposed algorithm. Because σ 2 is unknown for these real datasets, we applied median absolute deviation (MAD) proposed by Hampel (1974) , to robustly estimate σ 2 . More specifically, a MAD estimate of σ for piecewise constant signals is given by σ = Median D (1) y √ 2 Φ −1 (0.75) and

for piecewise linear signals by σ = Median D (2) y √ 6 Φ −1 (0.75) , where Φ −1 (·) represents the inverse cumulative density function of the standard normal distribution. In the analysis of GISTEMP data, the temperature anomalies are used rather than the actual temperatures. A temperature anomaly is a difference from an average or baseline temperature. The baseline temperature is typically computed by averaging thirty or more years of temperature data (1951 to 1980 in the current dataset). A positive anomaly indicates the observed temperature was warmer than the baseline, while a negative anomaly indicates the observed temperature was cooler than the baseline. For more details see Hansen et al. (2010) and Lenssen et al. (2019) .

The GISTEMP dataset has been frequently explored in change point literature, for example see Ruggieri (2013) , James and Matteson (2015) and Baranowski et al. (2019) . Panel (b) of 2020; November 26, 2020 and February 5, 2021. As can be viewed from the figure, there are remarkable declines in the growth rates for the second segment (perhaps due to the nationwide lockdown), the third segment (perhaps due to the international travel ban) and the segments from May 25, 2020 to September 9, 2020 (perhaps due to mandatory use of face masks and comprehensive contact tracing). The country witnessed a significant increase in the growth rate starting from September 9, 2020, which aligns with the reopening of businesses, schools and universities. The second national lockdown could be linked to the very small decrease in the slope of the segment from November 26, 2020 to February 5, 2021. Finally, the growth rate in the last segment seemed to be under control, which could be the result of COVID vaccinations.

This section empirically investigates the performance of mPRUTF in models with frequent change points as well as models with dependent random errors.

In order to evaluate the detection power of mPRUTF in signals with frequent change points, we employ a teeth signal for the piecewise constant case and a wave signal for the piecewise linear case. For the teeth signal, we consider a signal with 29 change points and varying segment lengths defined as follows:

• for 1 ≤ t ≤ 50, f t = 0 if (t mod 10) ∈ {1, . . . , 5}; f t = 1, otherwise,

• for 251 ≤ t ≤ 500, f t = 0 if (t mod 100) ∈ {1, . . . , 50}; f t = 1, otherwise.

The signal is displayed in the top-left panel of Figure 14 . The wave signal also has 29 change points with varying slopes which is defined as follows: The result are displayed for two different σ values.

• for 51 ≤ t ≤ 150, f t = −1 + 0.2 t if (t mod 20) ∈ {1, . . . , 10}; f t = 1 − 0.2 t, otherwise,

• for 151 ≤ t ≤ 250, f t = −1 + 0.1 t if (t mod 40) ∈ {1, . . . , 20}; f t = 1 − 0.1 t, otherwise,

• for 251 ≤ t ≤ 500, f t = −1 + 0.04 t if (t mod 100) ∈ {1, . . . , 50}; f t = 1 − 0.04 t, otherwise.

The top-right panel of Figure 14 shows this signal.

We generated 1000 independent samples of y t in model (1) with ε t i.i.d ∼ N (0 , σ 2 ) for both signals. The mPRUTF algorithm was then applied to these samples to estimate their change point locations. Figure 14 shows the histograms of the locations of these change points for the signals. The figure provides evidence that mPRUTF is unable to effectively detect change points in signals with frequent change points and short segments. It also shows that the results deteriorate when the noise variance σ 2 or the polynomial order r increase.

It turns out that the success of the mPRUTF algorithm critically relies on its stopping rule.

Equation (34) verifies that estimating the noise variance σ 2 and specifying the threshold x α from a Gaussian bridge process play crucial roles in the stopping rule. As discussed in Fryzlewicz (2018), the two widely used robust estimators of σ, Mean Absolute Deviation (MAD) (used here) and Inter-Quartile Range (IQR), overestimate σ in frequent change point scenarios. In addition, determining the accurate value of the threshold x α using (35) is affected in such scenarios. These two factors prevent the stopping rule from being effective in the mPRUTF algorithm and lead to the underestimation of change points for these scenarios. We must note that such poor performance in frequent change point scenarios is not specific to mPRUTF.

As investigated in Fryzlewicz (2018), state-of-the-art methods such as PELT, WBS, MOSUM, SMUCE and FDRSeg are among the approaches that fail in such scenarios.

How can mPRUTF's performance be affected by various types of random errors, such as non-Gaussian or dependent errors? This is of course an important question and will be the topic of future works. Notice that the dual solution path of trend filtering is not impacted by the type of random errors. However, the type of random errors plays a key role in the stopping rule of mPRUTF because the stopping rule is built based on Gaussian bridge processes established by Donsker's Theorem.

To empirically investigate the performance of mPRUTF for weakly dependent random errors, a simulation study is carried out here. To this end, we generate N = 5000 samples from model (1) with the PWC and PWL signals. We consider errors ε i from an AR(1) model with ε i = ρ ε i−1 + e i , for i = 1, . . . , n.

Here, e i 's are independent and identical random errors drawn from N 0 , (1 − ρ 2 ) σ 2 with ρ ∈ {−0.75, −0.5, −0.25, 0, 0.25, 0.5, 0.75} and σ ∈ 0.5, 1, 1.5, . . . , 4.5, 5 . The results of mPRUTF for both PWC and PWL signals are provided in Figure 15 . As can be seen, the results are very similar, in terms of the average number of change points, MSEs and the scaled Hausdorff distances, for various values of ρ. Therefore, it appears that the mPRUTF algorithm is quite robust against dependent error terms. Extensive studies of mPRUTF for non-Gaussian and dependent random errors will be carried out in future research. 

This paper proposed an algorithm, PRUTF, to detect change points in piecewise polynomial signals using trend filtering. We demonstrated that the dual solution path produced by the PRUTF algorithm forms a Gaussian bridge process for any given value of the regularization parameter λ. This conclusion allowed us to derive an efficient stopping rule for terminating the search algorithm, which is vital in change point analysis. We then proved that when there is no staircase block in the signal, the method guarantees consistent pattern recovery.

However, it fails in doing so when there is a staircase in the underlying signal. To address this shortcoming in such a case, we suggested a modification in the procedure of constructing the solution path, which effectively prevents false discovery of change points. Evidence from both simulation studies and real data analysis reveals the accuracy and the high detection power of the proposed method.

A.1 Proof of Theorem 1

For ε 1 , . . . , ε n , . . .

∼ N (0, σ 2 ), and a sequence q 1 , . . . , q n , . . . of real numbers, let

Define the partial weighted sum process S n (t) : 0 ≤ t ≤ 1 by

Obviously, for any k ≥ 1, and any 0 < t 1 < t 2 < · · · < t k ≤ 1, the vector S n (t 1 ), . . . , S n (t k ) has a multivariate normal distribution, and therefore S n (t) : 0 ≤ t ≤ 1 is a Gaussian process for any given n.

a) In our case, first note that

which is a partial weighted sum process of independent and identical Gaussian random variables ε i , i = 1 , . . . , n. The first equality in (A.1) is derived from the fact that the structure of the true signal f remains unchanged within the j-th block, meaning that D −A mt f = 0, for τ j + r a /m ≤ t ≤ τ j+1 − r b /m, which in turn implies

Thus, from the aforementioned argument for S n (t) , the process W j = W j (t) : (τ j + r a )/m ≤ t ≤ (τ j+1 − r b )/m is a Gaussian process, where

Additionally, with the conditions given in (26), W j is a Gaussian bridge process over the interval (τ j + r a )/m ≤ t ≤ (τ j+1 − r b )/m. Furthermore, from (A.1), the mean vector and covariance matrix of W j can be computed as 0 and σ 2 D −A D T A.2 Proof of Theorem 2 a) For t = 1, . . . , τ 1 − r b , and both signs ±1, according to the KKT conditions, the dual variables u(t) must lie between −λ and λ, that is,

which yields the constraint for the first block in (38).

b) Similar to the first block, for t = τ J 0 + r a , . . . , m, the constraint becomes

which leads to the result of (39).

c) For t = τ j + r a , . . . , τ j+1 − r b , and j = 1, . . . , J 0 − 1 the constraint for the exact pattern recovery becomes

Since the stochastic process u st j (t) is symmetric around zero, when s j+1 and s j have the opposite signs, this constraint reduces to (40). Otherwise, when s j+1 = s j , which accounts for the staircase in block j, from (18) For t ∈ τ c = {1, . . . , m}\τ , observe that D t f n − f = 0; therefore,

which is captured in event B n . The last inequality in the above equation occurs because, from Theorem 2, we have | u| ≤ λ n as well as the fact that m i=1 D t D T i = 2 r+2 , for an arbitrary r. In the following, we derive the conditions under which the probabilities of both events A n and B n converge to 1.

• To compute the probability of A n , we first note that, for every t ∈ τ , D t f n = y n, t − y n, t−1 − λ n s t − s t−1 = D t ε n + D t f − λ n s t − s t−1 ≤ D t ε n − λ n s t − s t−1 + D t f , where y n, t is the average of observations in the segment created by block t. The last inequality in the above statement is derived from the triangular inequality. Therefore, in order to verify A n , it is enough to show that, with the probability approaching The above probability converges to zero if, for some ξ > 0, the following conditions hold, λ n L n σ n −→ ∞ and 2 λ n L n σ n log(n − J 0 ) > (1 + ξ).

(A.14)

Case arbitrary r : For the piecewise polynomial of order r (r ∈ N), we note that, for any t ∈ [τ j , τ j+1 ), Therefore, for an arbitrary r, the PRUTF algorithm is consistent in pattern recovery if, in addition to λ n < δ n L 2r+1 n n 2r 2 r+2 , the conditions in (42) and (43) hold.

(b) As shown in part (c) of Theorem 2, in staircase blocks, the violation of the KKT conditions boils down to crossing the zero line for a Gaussian bridge process. Suppose j-th block is a staircase block; therefore, PRUTF can attain the exact discovery if u st j (t) ≤ 0 or u st j (t) ≥ 0, for all (τ j + r a )/m ≤ t ≤ (τ j+1 − r b )/m. Hence the probability of this event occurring is equal to Pr max 0 ≤ t ≤ L j u st j (t) ≤ 0 . According to Beghin and Orsingher (1999) ,

where S 2 r (L j ) is the L j -th diagonal element of the matrix σ 2 D A j D T A j −1 . As a result, the probability converges to zero as a vanishes. This result implies that the PRUTF algorithm fails to consistently recover the true pattern in the presence of staircase patterns.

Random fields and geometry

The generalized lasso problem and uniqueness

A survey of methods for time series change point detection

Detecting multiple generalized change-points by isolating single ones

Efficient implementations of the generalized lasso dual path algorithm

Algorithms for the optimal identification of segment neighborhoods

Estimating multiple breaks one at a time

Computation and analysis of multiple structural change models

Narrowest-over-threshold detection of multiple change points and change-point-like features

On the maximum of the generalized brownian bridge

A comparison of methods for trend estimation

Convergence of probability measures

Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning

Convex optimization

Functional analysis, Sobolev spaces and partial differential equations

A general criterion to determine the number of change-points

Model selection by lasso methods in a change-point model

Certain positive definite submatrices that arise from binomial coefficient matrices

An invariance principle for certain probability limit theorems

Analysis of changepoint models

Multiscale change point inference

Detecting possibly frequent change-points: Wild binary segmentation 2 and steepest-drop model selection

Wild binary segmentation for multiple change-point detection

Tail-greedy bottom-up data decompositions and fast multiple change-point detection

Multiscale dna partitioning: statistical evidence for segments

Multipolicy decisionmaking for autonomous driving via changepoint-based behavior prediction: Theory and experiment

Gaussian bridges

Nonparametric regression and generalized linear models: a roughness penalty approach

The influence curve and its role in robust estimation

The new econometrics of structural change: dating breaks in us labour productivity

Global surface temperature change

Multiple change-point estimation with a total variation penalty

Extensions of some classical methods in change point analysis

Some properties of a class of band matrices

Detection of dna copy number alterations using penalized least squares regression

An algorithm for optimal partitioning of data on an interval

Change points via probabilistically pruned objectives

Preconditioning the lasso for sign consistency

Optimal detection of changepoints with a linear computational cost

Using penalized contrasts for the change-point problem

Change-point problems: bibliography and review

Improvements in the uncertainty model in the goddard institute for space studies surface temperature (gistemp) analysis

Impacts of climate change and human activities on surface runoff in the dongjiang river basin of china

Distributed detection/localization of change-points in high-dimensional network traffic data

On optimal multiple changepoint algorithms for large data

Complexity analysis of the lasso regularization path

Locally adaptive regression splines

Circular binary segmentation for the analysis of array-based dna copy number data

On the lasso and its dual

Segmentation of accelerometer signals recorded during continuous treadmill walking

A test for a change in a parameter occurring at an unknown point

Continuous inspection schemes

Application of modified information criterion to multiple change point problems

Fire regime shifts as a consequence of fire policy and socio-economic development: an analysis based on the change point approach

On stepwise pattern recovery of the fused lasso

Fast and flexible admm algorithms for trend filtering

Pliss: labeling places using online changepoint detection

A pruned dynamic programming algorithm to recover the best segmentations with 1 to k max change-points

Corrections to properties and refinements of the fused lasso

Properties and refinements of the fused lasso

Changepoints in the north atlantic tropical cyclone record

On change point detection using the fused lasso method

A bayesian approach to detecting change points in climatic records

Empirical processes with applications to statistics

Application of anomaly detection algorithms for detecting syn flooding attacks

Detecting break points in generalised linear models

Splines in higher order tv regularization. International journal of computer vision

Regression shrinkage and selection via the lasso

The solution path of the generalized lasso

The lasso problem and uniqueness

Adaptive piecewise polynomial estimation via trend filtering

A review of change point detection methods

Weak convergence and empirical processes: With applications to statistics springer series in statistics

Detecting "disorder" in multidimensional random processes

Sharp thresholds for high-dimensional and noisy sparsity recovery using 1 -constrained quadratic programming (lasso)

Univariate mean change point detection: Penalization, cusum and optimality

The falling factorial basis and its statistical applications

Simultaneous change point analysis and variable selection in a regression problem

Estimating the number of change-points via schwarz'criterion

Least-squares estimation of a step function

Localising change points in piecewise polynomials of general degrees

A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data

On model selection consistency of lasso