key: cord-0466320-6lrk5utj authors: Horvath, Lajos; Trapani, Lorenzo title: Changepoint detection in random coefficient autoregressive models date: 2021-04-27 journal: nan DOI: nan sha: 694deaf98a166a5b2731bf242e74d8dc4baad74f doc_id: 466320 cord_uid: 6lrk5utj We propose a family of CUSUM-based statistics to detect the presence of changepoints in the deterministic part of the autoregressive parameter in a Random Coefficient AutoRegressive (RCA) sequence. In order to ensure the ability to detect breaks at sample endpoints, we thoroughly study weighted CUSUM statistics, analysing the asymptotics for virtually all possible weighing schemes, including the standardised CUSUM process (for which we derive a Darling-Erdos theorem) and even heavier weights (studying the so-called R'enyi statistics). Our results are valid irrespective of whether the sequence is stationary or not, and no prior knowledge of stationarity or lack thereof is required. Technically, our results require strong approximations which, in the nonstationary case, are entirely new. Similarly, we allow for heteroskedasticity of unknown form in both the error term and in the stochastic part of the autoregressive coefficient, proposing a family of test statistics which are robust to heteroskedasticity, without requiring any prior knowledge as to the presence or type thereof. Simulations show that our procedures work very well in finite samples. We complement our theory with applications to financial, economic and epidemiological time series. In this paper we study the stability of the autoregressive parameter of an RCA(1) sequence: (1.1) where y 0 denotes an initial value. We test for the null hypothesis of no change versus the alternative of at most one change (AMOC) i.e. H 0 : k * > N, (1.2) H A : 1 < k * < N and β 0 = β A . ( 1.3) The RCA model was firstly studied by Anděl (1976) and Nicholls and Quinn (2012) . It belongs in the wider class of nonlinear models for time series (see Fan and Yao, 2008) , which have been proposed "as a reaction against the supremacy of linear ones -a situation inherited from strong, though often implicit, Gaussian assumptions" (Akharif and Hallin, 2003) . Arguably, (1.1) is very flexible, allowing for the autoregressive "root" β 0 + ǫ i,1 to vary over time, and thus for the possibility of having stationary and nonstationary regimes. This may be a more appropriate model than a linear specification (see Lieberman, 2012; Leybourne et al., 1996) ; Giraitis et al. (2014) argue that a time-varying parameter model like (1.1) can be viewed as a competitor for a model with an abrupt break in the autoregressive root. Furthermore, equation (1.1) also allows for the possibility of (conditional) heteroskedasticity in y i ; Tsay (1987) shows that the widely popular ARCH model by Engle (1982) can be cast into (1.1), which therefore can be viewed as a second-order equivalent. Finally, a major advantage of (1.1) compared to standard autoregressive models is that estimators of β 0 are always asymptotically normal, irrespective of whether y i is stationary or nonstationary, thus avoiding the risk of over-differencing (see Leybourne et al., 1996) . Given such generality and flexibility, (1.1) has been used in many applied sciences, including biology (Stenseth et al., 1998) , medicine (Fryz, 2017) , and physics (Ślęzak et al., 2019) . The RCA model has also been applied successfully in the analysis of economic and financial data, and we refer to the recent contribution by Regis et al. (2021) for a comprehensive review. The inferential theory for (1.1) has been studied extensively. Schick (1996) , Koul and Schick (1996) and Janečková and Prášková (2004) have also been developed, including tests for stationarity (see e.g. Zhao and Wang, 2012; and Trapani, 2021) and for the randomness of the autoregressive coefficient (Akharif and Hallin, 2003; Nagakura, 2009; and Horváth and Trapani, 2019) . In contrast, changepoint detection is still underexplored in the RCA framework. To the best of our knowledge, the only exceptions are Lee (1998) , Lee et al. (2003) and Aue (2004) ; in these papers, a CUSUM test is proposed, but only for the stationary case and based on the unweighted CUSUM process. The latter is well-known to suffer from low power, being in particular less able to detect changepoints occurring at the beginning/end of the sample. As a solution, the literature has proposed weighted versions of the CUSUM process on the interval [0, 1], where more emphasis is given to observations at the sample endpoints (see Csörgő and Horváth, 1997) . Weighing functions are typically of the form [t (1 − t)] κ with 0 ≤ κ < ∞, for t ∈ [0, 1], with more weight placed on observations at the endpoints as κ increases. In particular, the case κ = 1 2 corresponds to the standardised CUSUM process also proposed by Andrews (1993) , whereas the more heavily weighted case κ > 1 2 corresponds to a family of test statistics known as "Rényi statistics" (see Horváth et al., 2020b) . When κ > 0, the asymptotics becomes more complicated, since the weighted statistics diverge at the endpoints t = 0 and t = 1, and one can no longer rely on weak convergence to derive the limiting distributions. In order to overcome this issue, Andrews (1993) proposes trimming the interval on which the weighted CUSUM process is studied; however, this has the undesirable consequence that tests are unable to detect breaks when these occurs e.g. at the end of the sample. In this paper, we bridge all the gaps mentioned above by proposing a family of weighted, untrimmed CUSUM statistics. Our paper makes the following four contributions. First, we study virtually all possible weighing schemes, deriving the asymptotics for all 0 ≤ κ < ∞. From a practical viewpoint, this entails that our test statistics are designed to detect breaks even when these are very close to the sample endpoints. Second, all our results hold irrespective of whether y i is stationary or not; this robustness arises from using the WLS estimator, and from the well-known fact that the RCA model does not suffer from the "knife edge effect" which characterizes linear models (Lumsdaine, 1996) . From a practical point of view, this entails that the tests can be applied with no modifications required, and no prior knowledge of the stationarity of y i or lack thereof. This feature is particularly desirable e.g. in the context of detecting the beginning (or end) of bubbles (see Harvey et al., 2016) : with our set-up, it is possible to detect changes from stationary to nonstationary/explosive behaviour (as e.g. in Horváth et al., 2020; and Horváth et al., 2021) which characterize the emergence of a bubble, but it is also possible -again with no modifications required -to detect changes from explosive to non-explosive behaviour, as would be the case at the end of a bubble. Being able to accommodate both cases is a distinctive advantage of the RCA set-up: whilst tests for changes towards an explosive behaviour have been developed in the literature (see, inter alia, Phillips et al., 2011; Phillips et al., 2015; and the review by Homm and Breitung, 2012) , tests to detect changes from an explosive behaviour are more rare, possibly due to the more complicated asymptotics in this case. Third, we allow for heteroskedasticity in both ǫ i,1 and ǫ i,2 , which is usually not considered in the RCA context; interestingly, for the case κ ≥ 1 2 , we recover the same, nuisance free distribution as in the homoskedastic case (in particular, when κ = 1 2 , we obtain a "classical" Darling-Erdős limit theorem). Hence, our modified test statistics can be used from the outset, with no prior knowledge required as to whether ǫ i,1 , or ǫ i,2 , or both, is heteroskedastic. Fourth, our asymptotics is based on strong approximations for the partial sums of an RCA sequence, which are valid irrespective of the stationarity or lack thereof of y i ; the strong approximation for the nonstationary case is entirely new. The remainder of the paper is organised as follows. We present our test statistics in Section 2, and study their asymptotics in the homoskedastic case, as a benchmark, in Section 3. The heteroskedastic case is studied in Section 4. In Section 5, we report a simulation exercise; applications to real data are in Section 6. Section 7 concludes. Extensions, technical lemmas and all proofs are relegated to the Supplement. NOTATION. We use the following notation: " D →" for weak convergence; " P →" for convergence in probability; "a.s." for "almost surely"; " D =" for equality in distribution; ⌊·⌋ is the integer value function. Positive, finite constants are denoted as c 0 , c 1 , ... and their value may change from line to line. Other notation is introduced further in the paper. Our approach is based on comparing the estimates of β 0 before and after each point in time k, by dividing the data into two subsets at k and estimating the autoregressive parameter in both subsamples. As mentioned above, we use WLS, with weights 1+y 2 i−1 . This has the advantages of (i) avoiding restrictions on the moments of the observations, and (ii) ensuring standard normal asymptotics irrespective of whether y i is stationary or not. The WLS estimators are Our test statistics will be functionals of the process N 1/2 (t(1 − t))( β ⌊(N +1)t⌋,1 − β ⌊(N +1)t⌋,2 ), if 2/(N + 1) ≤ t < 1 − 2/(N + 1), 0, if 1 − 2/(N + 1) < t ≤ 1. A "natural" choice to detect the presence of a possible change is to use the sup-norm of (2.3), viz. sup 0 0 for all 0 < δ < 1/2; (ii) w(t) is non decreasing in a neighborhood of 0; (iii) w(t) is non increasing in a neighborhood of 1. The functions w(t) satisfying Assumption 2.1 belong in a very wide class; a possible example is w(t) = (t (1 − t)) κ with κ > 0. The existence of the limit of (2.4) can be determined based on the finiteness of the integral functional (see Csörgő and Horváth, 1993) (2.5) As we show below, (2.5) entails that w(t) = (t (1 − t)) κ with 0 < κ < 1 2 can be employed in this context. In order to further enhance the power of our testing procedures, functions which place more weight at the sample endpoints can also be used, i.e. with κ ≥ 1 2 . As mentioned above, when κ = 1 2 , the corresponding limit theorems will be of the Darling-Erdős type (Darling and Erdős, 1956) ; when κ > 1 2 , the test statistics defined in (2.6) are known as "Rényi statistics" (Horváth et al., 2020b) . We begin by assuming that the errors {ǫ i,1 , ǫ i,2 , −∞ < i < ∞} have constant variance. Assumption 3.1. It holds that: In (1.1), the stationarity or lack thereof of y i is determined by the value of E ln |β 0 + ǫ 0,1 | (see Aue et al., 2006) . In particular, if −∞ ≤ E ln |β 0 + ǫ 0,1 | < 0, then y i converges exponentially fast to a strictly stationary solution for all initial values y 0 . Conversely, if E ln |β 0 + ǫ 0,1 | ≥ 0, then y i is nonstationary -specifically, |y i | diverges exponentially fast a.s. when E ln |β 0 + ǫ 0,1 | > 0, whereas it diverges in probability, but at a rate slower than exponential, in the boundary case E ln |β 0 + ǫ 0,1 | = 0 (see Horváth and Trapani, 2016) . We show that the asymptotic variance of the limiting process depends on whether y i is stationary or not: we therefore study the two cases (stationarity versus lack thereof) separately. We show that the variance of the weak limit of Q N (t) is We require the following notation. In order to study the case κ = 1 2 , we define (3.2) a(x) = (2 ln x) 1/2 and b(x) = 2 ln x + 1 2 ln ln x − 1 2 ln π. Also, in order to study the case κ > 1 2 , let Assumption 3.2. It holds that r 1 (N) → ∞, r 1 (N)/N → 0, and r 2 (N) → ∞, r 2 (N)/N → 0. We start with the stationary case −∞ ≤ E ln |β 0 + ǫ 0,1 | < 0. In this case, the solution of (1.1) under the null hypothesis is close to y i , the unique anticipative stationary solution of (3.5) We need the following (technical) assumption, to rule out the degenerate case that, under stationarity, the denominator of η 2 defined in (3.1) is zero with probability 1. Assumption 3.3. It holds that P {y 0 = 0} < 1. Theorem 3.1. We assume that H 0 of (1.2), Assumptions 2.1, 3.1 and 3.3 hold, and −∞ ≤ E ln |β 0 + ǫ 0,1 | < 0. (i) If I(w, c) < ∞ for some c > 0, then it holds that where {B(t), 0 ≤ t ≤ 1} is a standard Brownian bridge and η is defined in (3.1). (ii) For all x, it holds that (iii) If Assumption 3.2 is satisfied, then it holds that r N N κ−1/2 1 η sup for all κ > 1/2, where and r N , γ 1 and γ 2 are defined in (3.3), and a 1 (κ) and a 2 (κ) are independent copies of a (κ) defined in (3.4) . We now turn to the nonstationary case. We need an additional technical condition: Assumption 3.4. It holds that ǫ 0,2 has a bounded density. Theorem 3.2. We assume that H 0 of (1.2), Assumptions 2.1, 3.1, 3.4 hold, and 0 ≤ E ln |β 0 + ǫ 0,1 | < ∞. where a(x) and b(x) are defined in (3.2). (iii) If Assumption 3.2 is satisfied, then it holds that for all κ > 1/2, where and r N , γ 1 and γ 2 are defined in (3.3), and a 1 (κ) and a 2 (κ) are independent copies of a (κ) defined in (3.4). Theorems 3.1 and 3.2 stipulate that the limiting distributions of the weighted CUSUM statistics are the same irrespective of whether y i is stationary, explosive or at the boundary: the impact of nonstationarity is only on η 2 . Hence, it is important to find an estimator for η 2 which is consistent for all cases. Let We use the following estimator for η 2 Corollary 3.1. The results of Theorems 3.1-3.2 remain true if η is replaced with η N . Corollary 3.1 states that the feasible versions of our test statistics, based on η N , have the same distribution as the infeasible ones, based on η. Practically, this means that the test statistics developed above can be implemented with no prior knowledge as to whether y i is stationary or not. In the previous section we assumed, as is typical in the RCA literature, that the innovations Xu and Phillips, 2008 , for adaptive estimation in autoregressive models). Heteroskedasticity is particularly interesting and challenging in the RCA case: if the distribution of ǫ i,1 is allowed to change, the observations might change from stationarity to non stationarity even if β 0 does not undergo any change; however, inference on the RCA model will still be asymptotically normal in light of the properties of the WLS estimator discussed above. In this section, we extend all the results above allowing for heteroskedasticity in both ǫ i,1 and ǫ i,2 . Our results are valid also in the baseline case of homoskedasticity, and do not require any explicit knowledge of the form of heteroskedasticity. Changes in the distribution of {ǫ i,1 , ǫ i,2 , 1 ≤ i ≤ N} at times 1 < m 1 < . . . < m M < N are allowed through the following assumption. Assumption 4.1. It holds that m ℓ = ⌊Nτ ℓ ⌋, for 1 ≤ ℓ ≤ M, with 0 < τ 1 < τ 2 < . . . < τ M < 1. Henceforth, we will use the notation: m 0 = 0, m M +1 = N, τ 0 = 0 and τ M +1 = 1. For each subsequence {y i , m ℓ−1 < i ≤ m ℓ }, 1 ≤ ℓ ≤ M + 1, the condition for stationarity can be satisfied; in this case, the elements of this subsequence can be approximated with stationary variables {ȳ ℓ,j , −∞ < j < ∞} defined by the recursion where ǫ ℓ,j,1 = ǫ j,1 , m ℓ−1 < j ≤ m ℓ , and ǫ ℓ,j,1 , −∞ < j < ∞, j ∈ (m ℓ−1 , m ℓ−1 + 1, . . . , m ℓ ] are independent and identically distributed copies of ǫ m ℓ ,1 . The random variables ǫ ℓ,j,2 are defined in the same way. To allow for changes in the distributions of the errors, we replace Assumptions 3.1-3.4 with Assumption 4.2. It holds that: with Eǫ m ℓ ,2 = 0, Eǫ 2 m ℓ ,2 = σ 2 m ℓ ,2 and E|σ m ℓ ,2 | 4 < ∞; (v) if E ln |β 0 + ǫ m ℓ ,1 | < 0, P (ȳ ℓ,0 = 0) < 1, 1 ≤ ℓ ≤ M + 1; (vi) if E ln |β 0 + ǫ m ℓ ,1 | ≥ 0, then ǫ m ℓ ,2 has a bounded density, 1 ≤ ℓ ≤ M + 1. By Assumption 4.2, the WLS estimator may have different variances in the various regimes. In order to study the limit theory, consider the following notation: and define the zero mean Gaussian process We begin by investigating how the limits in Theorems 3.1 and 3.2 behave under heteroskedasticity. Theorem 4.1. We assume that H 0 of (1.2), and Assumptions 2.1, 4.1 and 4.2 hold. (ii) For all x, it holds that (iii) If Assumption 3.2 is also satisfied, then it holds that, for all κ > 1/2 Theorem 4.1 is only of theoretical interest, but we point out that heteroskedasticity impacts only on part (i). In that case, the limiting distribution of the weighted Q N (t) is given by a Gaussian process with covariance kernel Parts (ii)-(iii) of the theorem are the same as in the case of homoskedasticity. distribution is driven only by the observations which are as close to sample endpoints as o (N). On these intervals, (4.3) ensures that the asymptotic variance η 0 (t, t) is proportional to t(1 − t). Finally, note that, in light of the definitions of η 0 (t, t) and η 2 ℓ and a ℓ , heteroskedasticity in ǫ i,2 does not play a role in the nonstationary case. 4.1. Feasible tests under heteroskedasticity. By Theorem 4.1, the implementation of tests based on Q N (t) requires an estimate of η 0 (t, t). However, this is fraught with difficulties, since it requires knowledge of the different regime dates, m ℓ . Thus, we consider a modification of Q N (t) to reflect the possible changes in the variances of the errors. We then define the modified test statistic Under the null of no change, the same arguments as in the proof of Corollary 3.1 guarantee that c N,1 (t) and c N,2 (t) converge to the functions and c 2 (t) = c 1 (1) − c 1 (t), for 0 ≤ t ≤ 1, where τ ℓ is defined in Assumption 4.1, and a ℓ , 1 ≤ ℓ ≤ M + 1 is defined in (4.2). In order to present our main results, we define the zero mean Gaussian process is also a zero mean Gaussian process with E∆(t)∆(s) = b(min(t, s)), where Let g(t, s) = E (Θ(t)Θ(s)); elementary calculations yield (4.9) Theorem 4.2. We assume that H 0 of (1.2), and Assumptions 2.1, 4.1 and 4.2 hold. where {Θ(t), 0 ≤ t ≤ 1} is the Gaussian process defined in (4.7). where a(x), b(x) are defined in (3.2) and g(t, s) is given in (4.9). (iii) If Assumption 3.2 also holds and κ > 1/2, then where t 1 = r N /N, t 2 = 1 − t 1 , and r N , γ 1 , γ 2 are defined in (3.3), a 1 (κ) and a 2 (κ) are independent copies of a (κ) defined in (3.4), and g(t, s) is given in (4.9). Some comments on the practical implementation of the results in Theorem 4.2 are in order. Parts (ii) and (iii) require an estimate of g(t, t); to this end, we use c N,1 (t) defined in (4.5) instead of c 1 (t), and we estimate b(t, s) as Then we can define (4.10) The implementation of part (i) of Theorem 4.2 is more complicated, since the presence of nuisance parameters is not relegated to a multiplicative function. We reject the null hypothesis with c(α) defined as P sup 0 1 2 (4.17) The theorem ensures that, as long as (4.12), (4.14) and (4.16) hold, our tests reject the null with probability (asymptotically) 1. Conditions (4.12), (4.14) and (4.16) essentially state that breaks will be detected as long as they are "not too small", and "not too close" to the endpoints of the sample. Consider (4.12). This condition can be understood by considering two cases. First, when k * N → c ∈ (0, 1), it is required that N 1/2 ∆ N → ∞: this entails that β A may depend on the sample size N, so that even small changes in the regression parameter are allowed. When ∆ N > 0, Turning to (4.14), when k * N → c > 0, the test is powerful as long as again small changes are allowed for, but these are now "less small" by a O (ln ln N) factor. Conversely, when ∆ N > 0, (4.12) holds as long as k * (ln ln N) −1/2 → ∞: breaks that are as close as O √ ln ln N periods to the sample endpoints can be detected. This effect is reinforced in the case of Rényi statistics, where, on account of (4.16), the only requirement is that k * > r N . We provide some Monte Carlo evidence on the performance of the test statistics proposed in Section 4.1. 2 Data are generated using (1.1). In all experiments, we use β 0 ∈ {0.5, 0.75, 1, 1.05} to consider both the cases of stationary and nonstationary y i . We have experimented also with different values of β 0 , but results are essentially the same. Under the alternative, we consider both a mid-sample and an end-of-sample break The shocks ǫ i,1 and ǫ i,2 are simulated as independent of one another and i.i.d. with distributions N (0, σ 2 1 ) and N (0, σ 2 2 ) respectively. We report results for σ 2 1 = 0.01 and σ 2 2 = 0.5 -the value of σ 2 1 is based on "typical" values as found e.g. in the empirical applications in Horváth and Trapani (2019). We note however that, in unreported simulations using different values of σ 2 1 and σ 2 2 , the main results do not change, except for the (expected) fact that tests have better properties (in terms of size and power) for smaller values of σ 2 2 . Similarly, the test performs better (with empirical rejection frequencies closer to their nominal value) when σ 2 1 is larger, and tends to be undersized for smaller values of σ 2 1 . Both effects (of σ 2 1 and σ 2 2 ) vanish as N increases. When allowing for heteroskedasticity, we generate ǫ i,1 and ǫ i,2 as i.i.d.N (0, σ 2 1 ) and i.i.d.N (0, σ 2 2 ) for 1 ≤ i ≤ N/2, and i.i.d.N (0, 1.5σ 2 1 ) and i.i.d.N (0, 1.5σ 2 2 ) for N/2 + 1 ≤ i ≤ N. Finally, we generate N + 1, 000 values of y i from (1.1) -with y 0 = 0 -and discard the first 1, 000 values. All our routines are based on 2, 000 replications, and we use critical values corresponding to a nominal level equal to 5% 3 -hence, empirical rejection frequencies under the null have a 95% confidence interval [0.04, 0.06]. We consider four different cases: (i) homoskedasticity in both ǫ i,1 and ǫ i,2 ; (ii) homoskedasticity in ǫ i,1 and heteroskedasticity in ǫ i,2 ; (iii) homoskedasticity in ǫ i,2 and heteroskedasticity in ǫ i,1 ; and, finally, (iv) heteroskedasticity in both ǫ i,1 and ǫ i,2 . Table I in Gombay and Horváth (1996) . From Tables 5.1 and 5.2, all tests work very well in all cases considered, possibly being slightly worse in the fully homoskedastic case. Tests never over-reject, not even in small samples -conversely, there are some cases of (severe) under-rejection in small samples, especially when κ is around 0.5. As N increases, however, this vanishes and the empirical rejection frequencies all lie within their 95% confidence interval. The only exception is the Rényi statistic with κ = 0.51, which is severely undersized even in large samples. We also consider the case of end-of-sample breaks (5.2). Results are in Figures C.5-C.8. The results show, essentially, the same pattern as above: all test statistics have monotonic power in ∆, and whilst heteroskedasticity in ǫ i,2 does not affect the whole picture, heteroskedasticity in ǫ i,1 gives very different results, with its presence increasing power especially for β 0 ≥ 1. However, the impact of κ here is, as expected, completely reversed: the power versus breaks that occur at the end of the sample increases monotonically, ceteris paribus, with κ. This makes a difference particularly in the case of medium-sized changes -e.g., when ∆ = 0.35, increases in power from κ = 0 to κ = 1 are in the region of 10 − 15%. Finally, in Figures C.9-C.10 we report a small scale exercise where we evaluate the empirical rejection frequencies when β 0 is close to unity. We only consider heteroskedasticity in ǫ i,2 : results for other cases are available upon request and, in general, no major differences are noted compared to the other results. These "boundary" cases should be helpful to shed more light on the performance of our procedure when detecting changes from stationarity to nonstationarity (when β 0 < 1 and changes are positive), and vice versa (when β 0 > 1 and changes are negative). The main message of Figures C.9-C.10 is that our tests work very well in these boundary cases. In particular, the tests are very effective in detecting changes from stationarity to explosive behaviour, and vice versa. The power is especially high when β 0 > 1 -i.e. when the RCA process changes from an explosive to a stationary behaviour. This suggests a possible, effective test to detect e.g. the collapse of a bubble in financial econometrics applications. We illustrate our approach through three applications to real data. In Sections 6.1-6.2, we use economic and financial time series; in Section 6.3, we use Covid-19 data. for several countries. Their analysis shows that not only the average level of inflation (as is well-documented), but also its serial correlation, may be subject to numerous changes. We use monthly CPI data taken from the FRED dataset over a period spanning from January 1913 until January 2021, with N = 1297. We use monthly inflation rates, calculated as the month-on-month log differences of the series. Given that the series is quite long, we expect to see more than one break; hence, we use binary segmentation (as suggested in Vostrikova, 1981), reporting the point in time at which the relevant test statistic is maximised as the breakdate estimate. Results in Table XXX differ We use the logs of the original data, with no further transformations. For each estimated changepoint, we indicate which statistic has detected it, and at which (nominal) level. When more than one statistic finds a break, the estimated date is computed as the majority vote across statistics, using the point in time at which the relevant statistic finds a break as estimator. Whilst details are available upon request, we note that breaks were detected with this order (from the first to be detected to the last one): break in 1973; break in 1987; break in 1999. We found three changepoints in the whole series (see Table 6 .2). The first one, whose date corresponds to the well-known 1973-74 market crash (due to the collapse of the Bretton-Woods system, and compounded by the oil shock), is relatively close to the beginning of the sample, and indeed it has been identified by the Rényi statistics (the other tests do not identify such break). The second changepoint can also be related to a specific event, i.e. the Black Monday (the break is found in November 1987, i.e. one month later the actual event). Finally, Rényi statistics do not find the third changepoint, which occurs mid-sample, confirming the idea that mid-sample breaks are better detected using milder weight functions (indeed, not even the Darling-Erdős test finds evidence of such a break); the break is found before the collapse of the dot-com bubble (traditionally dated around March 2000), reflecting the trouble brewing in the months leading to the event. 6.3. Application III: Covid-19 UK hospitalisation data. In this section, we consider UK data on Covid-19 -in particular, we use data on hospitalisations rather than cases, as the latter may be less reliable due to the change in number of tests administered. Shtatland and Shtatland (2008) inter alia advocate using a low-order autoregression as an approximation of the popular SIR model, especially as a methodology for the early detection of outbreaks. In this context, the autoregressive root is of crucial importance since, as the authors put it, if "the parameter is greater than one, we have an explosive case (an outbreak of epidemic)". It is therefore important to check whether the observations change from an explosive to a stationary regime (meaning that the epidemic is slowing down), or vice versa whether the change occurs from a stationary to an explosive regime (i.e., the epidemic undergoes a surge, or "wave"). In this respect, the empirical exercise in this section should be read in conjunction with Figures C.9-C.10. We use (logs of) UK daily data, for the four UK nations, and for the various regions of England 4 , again using binary segmentation to detect multiple breaks. We only report results obtained using Rényi statistics (with κ = 0.51, 0.55, 0.65, 0.75, 0.85 and 1); the other tests give very similar results, available upon request. As far as breakdates are concerned, we pick the ones corresponding to the "majority vote" across κ, although discrepancies are, when present, in the region of few days (2 − 5 at most). The results in Table 6 .3 suggest that, with the exception of Wales, there were multiple breaks in all series considered 5 ; we note that Wales is an outlier as regards hospital admissions, because these are counted in a different way than the rest of the UK 6 . Some breaks occur closely to the sample endpoints, highlighting the importance of using Rényi statistics. Also, all changepoints indicate a transition of the autoregressive coefficient β 0 around 4 The data are available from https://ourworldindata.org/grapher/uk-daily-covid-admissions?tab=chart&stackMode=absolute&time=2020-03-29..latest®ion=World 5 Figure 6 .1 contains the same information, albeit limited to the four UK nations only to save space 6 Specifically, Wales reports also suspected Covid-19 cases, whereas all the other nations only report confirmed cases; see https://www.cebm.net/covid-19/the-flaw-in-the-reporting-of-welsh-data-on-covid-hospitaladmissions/ All series end at 30 January 2021. We use the logs of the original data (plus one, given that, in some days, hospitalisations are equal to zero): no further transformations are used. All changepoints have been detected by all Rényi-type tests -no discrepancies were noted. Detected changepoints, and their estimated date, are presented in chronological order; breakdates have been estimated as the points in time where the majority of tests identifies a changepoint. Whilst details are available upon request, we note that breaks were detected with this order (from the first to be detected to the last one): breaks in August; breaks in April and January; breaks in October-November; breaks in December. For each changepoint, we report in square brackets, for reference, the left and right WLS estimates of β 0 . unity. Differences between pre-and post-break values of β 0 are small, but sufficient to trigger, or quench, an outbreak -on account of the Monte Carlo evidence contained in Figures C.9-C.10, we would not expect spurious detection of breaks when these are absent. Considering first regions of England, all of these experience a break in early April as a consequence of the first national lockdown, which started on March 23rd, 2020, but was preceded by growing concerns, and closures in the education and hospitality sectors, the week before. Similarly, all series have a subsequent change (with β 0 exceeding unity after the breaks) in late August -one exception is London, where the change occurred in early August. These breaks indicate the beginning of the "second wave" in the UK, which has been ascribed (also) to an increase in travelling during the holiday season and which was officially acknowledge by the PM on September 18th, 2020. The breaks in autumn, where present, can be explained as the effect of the local and national lockdowns which were implemented at the end of October, and of the easing of restrictions in early December. Finally, all series have a change towards stationarity around mid-January, which again can be explained as the effect of the national lockdown announced on January 4th, 2021, and of the growing concerns about a third wave voiced before and during the Christmas holidays. The same picture applies to England as a whole. Conversely, the other UK nations experienced slightly different patterns, likely as a consequence of different policies implemented by local governments. With the exception of Wales, which seems to have only one break (but note the caveat about Welsh data mentioned above), Scotland and Northern Ireland are essentially aligned with the results for England in terms of the effects of the first lockdown, the summer holiday, and the third lockdown. In this paper, we study changepoint detection in the deterministic part of the autoregression coefficient of a Random Coefficient AutoRegressive model. We use the CUSUM process based on comparing the left and right WLS estimators. In order to be able to detect changepoints close to the sample endpoints, we study weighted statistics, where more weight is placed at the sample endpoints. We consider a very wide class of weighing functions, studying: (i) weighing schemes based on the functions w (t), which drift to zero, at sample endpoints, more slowly than We discuss the computation of asymptotic critical values, and report some of them (Section A.1). Further, in Section A.2, we study the power versus more general alternatives that the AMOC one considered in Section 4.2. A.1. Computation of critical values and further simulations under homoskedasticity. The asymptotic critical values fo the homoskedastic case are in Table A .1. When κ < 1 2 , we have simulated the critical values using the algorithm proposed in Franke et al. (2020); our results differ marginally from the values reported in the original paper, but unreported experiments show that our critical values yield less undersizement than the original ones, at least in small samples. In the case κ > 1 2 , we know that critical values are the same for both the homoskedastic and the heteroskedastic case. In all experiments (and in the computation of critical values), we use symmetric trimming -i.e., r 1 (N) = r 2 (N); in this case, it is easy to see that P [max (a 1 (κ), a 2 (κ)) ≤ c α ] = (P [a(κ) ≤ c α ]) 2 , and our critical values are based on Table 1 in Horváth et al. (2004) . The most critical case is the case κ = 1 2 . For a given nominal level α, asymptotic critical values are given by c α = − ln (−0.5 ln (1 − α)), and Theorems 3.1-4.2 state that the limiting distribution of the max-type statistics is the same in both the homoskedastic and the heteroskedastic cases. Interestingly (and contrary to the heteroskedastic case), our simulations show that in the homoskedastic case, asymptotic critical values work well, with no under-rejection and good power. To complement Section 5, we also report, as a benchmark, some evidence on the size of our tests under homoskedasticity, using the theory in Section 3. Empirical rejection frequencies under the null are reported in Table A Table A .1). Broadly speaking, all test statistics have the correct size for large samples; as mentioned above, this also includes the Darling-Erdős statistic, based on asymptotic critical values, despite the notoriously slow convergence to the extreme value distribution. When N = 200 (i.e., in small samples), tests are, occasionally, mildly oversized. This also happens when κ = 0.45 (and κ > 0.5 with β 0 = 1.05); conversely, the test is grossly undersized almost under any circumstances when κ = 0.51, although this seems to impove as N increases. We note that, with the few exceptions mentioned above, the value of β 0 does not affect the empirical rejection frequencies in any obvious way (the case β 0 = 1 is marginally worse than the other ones for small N, but this vanishes as N increases). A.2. Consistency versus multiple breaks. As a complement to the results in Section 4.2, we briefly discuss the power of our tests against the alternative of R changes: with k 0 = 0 and k R+1 = N. For the sake of simplicity we require Assumption A.1. It holds that k ℓ = ⌊Nτ ℓ ⌋ where 0 < τ 1 < τ 2 < . . . < τ R < 1, with τ 0 = 0 and τ R+1 = 1. Extending the theory developed above, it can be shown that, for all 1 ≤ ℓ ≤ R and 1 ≤ i ≤ N whereby −∞ ≤ E ln |ǫ ℓ + ǫ i,1 | < ∞, the following limits exist: Elementary arguments give that (A.1) implies Corollary A.1 states that our test has power even in the presence of many breaks, and that the consistency (or lack thereof) depends on the largest break only, irrespective of the magnitude and number of all the other ones. The proof follows from the same arguments as that of Theorem 4.3. We begin by stating some preliminary facts. If H 0 holds, then Under the null hypothesis the recursion is where ρ i = β 0 + ǫ i,1 . We can solve the recursion in (B.2) explicitly If it holds that We note that We are now ready to state our technical lemmas. The first one states that we can replace y i withȳ i in the sums in (B.1), and the difference will be small. Given that We can assume that 0 < ν 3 = min(ν 1 , ν 2 )/3 < 1 and we conclude completing the proof of (B.6). The same arguments give (B.7) and (B.8). The representation in (B.12) means that Lemma B.2. If H 0 of (1.2) and Assumption 3.1 are satisfied, and (B.3) holds, then for all α > 0 with some c > 0. Proof. We begin by showing (B.16). Using (B.5), we haveȳ 0 = z ℓ + u ℓ , and y 0,ℓ = z ℓ + u * ℓ , where Consider II first. Using (B.9), (B.10) and (B.11) we get via Markov's inequality, with some constants ν 1 > 0, ν 2 > 0, ν 3 > 0 and c 1 , c 2 = c 2 (β), and similarly it can be shown that In (B.13), we can assume ℓ is large enough that ℓ −β < 1/8 and therefore Therefore, on the set U ℓ , we have Putting with some c 1 > 0 and ζ 1 < 1/2. It follows from (C.1) and the Law of the Iterated Logarithm (LIL henceforth) that for all ζ 2 < 1/2. Therefore, using Taylor's expansion, we obtain from (C.2) that Since the proofs of the approximations in Aue et al. (2014) Using with some ζ 3 < 1/2. Putting together we get for all ζ 4 > 0. It follows from (B.1), (C.4) and (C.5) with some ζ 5 < 1/2. The computation of the covariance function shows that is a Brownian bridge. Let 0 < δ < 1/2. Recalling Assumption 2.1(i), it follows from (C.6) and (C.7) that It follows from (C.6) that and (C.7) implies Csörgő et al. (1986) proved that, if I(w, c) < ∞ for some c > 0, then Hence for every x > 0 we have is a Brownian bridge. The first part of Theorem 3.1 now follows from putting everything together. We now turn to proving part (ii) of the theorem. Let c(N) = (ln N) 4 . First we observe that and by (C.7) Hence the LIL for the Wiener process implies 1 (2 ln ln N) 1/2 max Putting all these results together, we conclude that Futher, the approximations in (C.6) and (C.7) yield max c(N )≤k≤N/2 Theorem A.4.2 in Csörgő and Horváth (1997) states that for all x, which implies the second part of the theorem. Finally, the proof of part ( Proof of Theorem 3.2. We begin by showing that the approximations in (C.6) and (C.7) hold in the non stationary case too. These approximations imply immediately the limit results in the present theorem, repeating exactly the same passages as in the proof of Theorem 3.1 We begin by noting that Lemma A.4 of Horváth and Trapani (2016) implies that there are two constants, 0 < δ < 1 and c 1 > 0, such that Equation (C.12), in turn, implies that (C.13) with some constant c 2 . Using (C.13) and Markov's inequality, we have for all x > 0 and ζ 1 > 1−δ Hence there is ζ 2 < 1 such that for all x > 0 (C.14) lim We obtain immediately from (C.14) and (C.15) . Note that we can assume -without loss of generality -that c ℓ ≤ 1 since, along the lines of the proof of (C.13), c ℓ = O(ℓ −δ ) as ℓ → ∞. Theorem 3.1 of Móricz et al. (1982) implies Hence by Markov's inequality for all x > 0 P max with the choice of ζ 3 > (1 − δ) /2. Hence, there exists a ζ 4 < 1/2 such that Following the proof of (C.18) and (C.19) one also can verify that with some ζ 5 < 1/2. Combining (C.16), (C.17), (C.18), (C.19), (C.20) and (C.21), it is easy to see that it is possible to find a ζ 6 < 1/2 such that Finally, by the Komlós-Major-Tusnády approximation (see Komlós et al., 1975, and Komlós et al., 1976) , we can define two independent Wiener processes {W N,1 (x), 0 ≤ x ≤ N/2} and {W N,2 (x), 0 ≤ x ≤ N/2} such that Putting together (C.22), (C.23), (C.24) and (C.25), it finally follows that, for some ζ 7 < 1/2 The desired results now follow from repeating exactly the same passages as in the proof of Theorem 3.1. Proof of Corollary 3.1. We note that Hence, we have The proofs of Theorems 3.1 and 3.2 show that, irrespective of y i being stationary or not, We study the leading term in (C.29), showing that, when E ln |β 0 + ǫ 0,1 | < 0 for some ζ > 0. We begin by showing which follows from the definition of y i and Assumption 3.1(ii). Recall that Aue et al. (2006) show that θ 1 = E |β 0 + ǫ 0,1 | k < 1, for some 0 < k < 1, and consider the construction where ǫ * j,1 and ǫ * j,2 are completely independent, and independent of ǫ j,1 and ǫ j,2 , with ǫ * ( , so that it is easy to see that Finally, consider having used the fact that ǫ 2 i,1 y 4 i−1 (N) and ǫ 2 j,1 y 4 j−1 (N) are independent for |i − j| > 2θ ⌊ln N⌋. Putting all together, it follows that for some ζ > 0, whence (C.31) follows immediately. Using exactly the same logic, it is also possible to show that which, with (C.31), finally implies (C.30). The same logic also yields whence we have finally shown that η N = η + O P (N −ζ ) when y i is stationary. It remains to show that the corollary also holds in the nonstationary case E ln |β 0 + ǫ 0,1 | ≥ 0. following the proof of (B.16), it is easy to see that The same logic yields Hence and, by the same token so that, in this case, it holds that η N = η + O P (N −1 ). Proof of Theorem 4.1. We assume, without loss of generality, that M ≥ 1 -the M = 0 case is already covered in Theorem 3.1. We begin by showing two preliminary sets of results: (i) that the approximations developed in the proofs of Theorems 3.1 and 3.2 are valid on each segment (m ℓ−1 , m ℓ ], 1 ≤ ℓ ≤ M + 1, with only the variances of the approximating Gaussian processes depending on ℓ; and (ii) that the approximating process on a segment is independent of the approximating processes on the other segments. Let Following the proofs of Lemma B.1 (in the case −∞ ≤ ln |β 0 + ǫ m ℓ ,1 | < 0), and of Theorem 3.2 (in the case 0 ≤ ln |β 0 + ǫ m ℓ ,1 | < ∞), it can be shown that This entails that we can replace the partial sums of (y 2 i−1 ǫ i,1 + y i−1 ǫ i,2 )/(1 + y 2 i−1 ) with the partial sums of the z i 's. with some ζ < 1/2 for all 1 ≤ ℓ ≤ M + 1. Recall that the results in Aue et al. (2014) and Berkes et al. (2014) are based on blocking arguments: thus, W N,ℓ,1 (k) and W N,ℓ,2 (k) are -as well as independent of each other -independent across ℓ. Thus we obtain the following approximations for some 0 < ζ < 1/2. We further note that We now prove part (i) of the theorem; the proof is based on the same arguments used in the proof of Theorem 3.1. Let 0 < δ < min {τ 1 , τ M }; then having used Assumption 2.1 in the second passage, and (C.33)-(C.34) in the last one. Also note having used (C.33). Using (C.8), it follows that for all x > 0. Similar arguments yield for all x > 0. Putting all together, it holds that which proves part (i) of the theorem. The proof of part (ii) is based again on the approximations in (C.33) and (C.34). First we show Also, it holds that Finally, note that, by similar passages as above and the same for all W N,ℓ,1 (Nt), 1 < ℓ, we conclude that The same results can be shown, with the same logic, over the intervals 1 − t 1 < t < 1 and 1 2 < t < 1 − t 2 . Next we note that and (C.40) sup It thus follows from (C.33) that The same result can be shown, with exactly the same logic, in the interval 1 − t 2 < t < 1 − t 1 , viz. sup Further, using (C.39) and, by the Darling-Erdős theorem This proves (C.35). Since The proof of Theorem 4.1(ii) is now complete. Part (iii) follows by repeating the proof of part (ii) with minor modifications, and therefore we only report some passages. Let r 1,N = r 2,N = r N for simplicity; it is easy to see that Using (C.41), this also entails that We have By using the same logic as in the proof of (C.38), it can be shown that By the same token, it follows that Further, note that and the same holds on the interval 1 − (r N /N) 1/2 ≤ t < 1 − r N /N. Thus, we only need to focus on finding the limiting distributions of On the intervals r N /N ≤ t < (r N /N) 1/2 and 1 − (r N /N) 1/2 ≤ t < 1 − r N /N, however, Γ N (Nt) with some ζ > 0, and with some ζ 1 > 0. Thus it holds that and we can use the results in the proof of Theorem 4.1 to establish Theorem 4.2. Proof of Theorem 4.3. The proof follows from standard arguments (see e.g. Csörgő and Horváth, 1997), and therefore we only report the most important passages. We begin by considering (4.13), and show that, under (4.12) Under our assumptions, the same arguments as above yield Hence (4.13) follows immediately. As far as (4.17) is concerned, we begin by noting that, under the conditions of Theorem 4.2, one can show following the same passages as in the proof of that theorem that, if |β 0 − β A | is bounded, as N → ∞, then there are functions c * 1 (t) and b * (t) such that (C.51) and (C.52) entail that there exists a function g * (t, s) which entails that (4.13) follows as long as having let ⌊Nt * ⌋ = k * . But this follows immediately from (C.49) and (C.50), noting that, by Finally, (4.15 ) follows from the same logic. Break 4 Break 5 Break 6 Break 7 M onthl y l og di fference of CPI Efficient detection of random coefficients in autoregressive models