key: cord-0123625-ch6qslbe
authors: Dhar, Subhra Sankar; Wu, Weichi
title: Shift identification in time varying regression quantiles
date: 2020-11-12
journal: nan
DOI: nan
sha: 5260ee3e2e9f13071f377c39037f6ced70bd3f5e
doc_id: 123625
cord_uid: ch6qslbe

This article investigates whether time-varying quantile regression curves are the same up to the horizontal shift or not. The errors and the covariates involved in the regression model are allowed to be locally stationary. We formalize this issue in a corresponding non-parametric hypothesis testing problem, and develop an integrated-squared-norm based test (SIT) as well as a simultaneous confidence band (SCB) approach. The asymptotic properties of SIT and SCB under null and local alternatives are derived. Moreover, the asymptotic properties of these tests are also studied when the compared data sets are dependent. We then propose valid wild bootstrap algorithms to implement SIT and SCB. Furthermore, the usefulness of the proposed methodology is illustrated via analysing simulated and real data related to COVID-19 outbreak and climate science.

1 Introduction Koenker and Bassett (1978) proposed the concept of quantile regression as an alternative approach to the least squares estimation (LSE), and it provides us with conditional quantile surface describing the relation between the univariate response variable and the univariate/multivariate covariates. In 1980s and early 1990s, many research articles had been published on parametric quantile regression (see, e.g., Ruppert and Carroll (1980) , Koenker and Bassett (1982) , Efron (1991) , Gutenbrunner and Jureckova (1992) ), and since 1990s-2000s, several attempts were made for non-parametric quantile regression methods as well (see, e.g., Chaudhuri (1991) , Koenker et al. (1994) , Yu and Jones (1998), Takeuchi et al. (2006) ). Once the estimation of the quantile regression developed in both parametric and non-parametric models, comparing the quantile regression curves started to get attention in the literature (see, e.g., Dette et al. (2011) and a few references therein) as quantile curves give us the feature of the conditional distribution of the response variable conditioning on the covariate. Motivated by this powerful statistical properties of the quantile curves, we hereby study the following research problem.

Consider two regression models with common response variable and the same covariates for two different, possibly dependent groups. Formally speaking, suppose that (y i,1 , x i,1 ) n 1 i=1 and (y i,2 , x i,2 ) n 2 i=1 are two sets of data, where the covariates x i,1 = (x i,1,1 , ...x i,p 1 ,1 ) and x i,2 = (x i,1,2 , ...x i,p 2 ,2 ) are p 1 × 1 and p 2 × 1 vectors, respectively. Now, for τ ∈ (0, 1) and s = 1 and 2, we define the conditional quantiles Q τ (y i,s |x i,s ) := inf{s : F y i,s |x i,s (s|x i,s ) > τ } = θ 1,τ,s i n s x i,1,s + · · · + θ ps,τ,s i n s x i,ps,s .

(1.1)

Note that model (1.1) can be written as y i,s = x i,s θ τ,s i n s + e i,τ,s , i = 1, · · · , n s , s = 1 and 2, (1.2)

where for s = 1 and 2, θ τ,s (t) = (θ 1,τ,s (t), ..., θ ps,τ,s (t)) are p s × 1 vectors with each element being a smooth function on [0, 1], and the errors e i,τ,s satisfy Q τ (e i,τ,s |x i,s ) = 0.

The last condition on the τ -th quantile of the conditional distribution of the errors given the covariate ensures the model (1.2) is identifiable. In particular, we allow x i and e i,τ to be locally stationary and correlate with each other, which captures a distinctly complex dependence structure of the covariates and the errors. Further technical assumptions on x i,s and e i,τ,s will explicitly be discussed in Section 3. We are now interested in the following hypothesis problem. For a pre-specified vectors c 1 ∈ R p 1 ×1 and c 2 ∈ R p 2 ×1 , define m s (t) = c s θ τ,s (t) for s = 1, 2, c s = (c 1,s , · · · , c ps,s ) , and we want to test H 0 : m 1 (t) = m 2 (t + d) for 0 < t < 1 − d and some unknown constant d ∈ [0, 1). (1.3)

Let us now discuss a special case. Note that when d = 0, H 0 will be equivalent to testing the equivalence of m 1 (t) and m 2 (t) for t ∈ (0, 1). In this case, when c 1 = (1, 0, . . . , 0) and c 2 = (1, 0, . . . , 0) , the problem will coincide with comparing the curves θ 1,τ,1 (t) and θ 1,τ,2 (t)

for t ∈ (0, 1), and such comparison can be carried out by an appropriate functional notion of difference between estimated θ 1,τ,1 (t) and θ 1,τ,2 (t). Such types of problems have already been explored in the literature (see Munk and Dette (1998) ). However, our proposed testing of hypothesis problem described in (1.3) is fundamentally different from the aforesaid case.

Firstly, we are comparing two sets of certain linear combinations of the components of the quantile coefficients of (1.1); it is not a direct comparison between particular quantiles of two different distributions. Secondly, note that in (1.1) and (1.2), the quantiles are time varying, which is entirely different from the usual regression quantiles. We will discuss more about time varying regression models in the next paragraph. Finally, in (1.3), we are checking whether there is any nonnegative shift between two functions m 1 and m 2 or not. Here, it should be pointed out that the testing of hypothesis problem described in (1.3) can be written for some negative d as well but without loss of generality, we study for d ≥ 0. Moreover, the model described in (1.3) with respect to the time parameter is often applicable to real data as well. For example, the Gross Domestic Product (GDP) curves of two nations over a fixed period of time or the survival rate of women aged more than sixty five of two different nations over a long period of time. For usual mean based time varying models, such type of problem was studied by Gamboa et al. (2007) , Vimond (2010) , Collier and Dalalyan (2015) and a few references therein. However, none of them studied such problems in the framework of quantile regression (i.e., (1.1) or (1.3)) for time varying models.

In this article, we thoroughly study this problem of testing (1.3) assuming the functions m 1 (t) and m 2 (t) are strictly monotone on [0, 1] . Then the null hypothesis (1.3) holds if and only if (m −1 1 (u)) −(m −1 2 (u)) = 0 when u belongs to a certain interval, a subset of (0, 1) (see Section 2). Therefore, testing of (1.3) are carried out based on a Bahadur representation of time varying quantile regression coefficients and a Gaussian approximation to the estimated difference ∆(u) := (m −1 1 (u)) − (m −1 2 (u)) . We also discuss the relaxation of the monotone assumption. Our major contributions are the following.

The first major contribution is to develop a formal L 2 test in checking the hypothesis (1.3) for dependent and non-stationary data. The test, which we denote as squared integrated test (SIT) test has the form of (∆ 2 (t)w(t)dt, where w(t) is a certain weight function, and∆(t) is a suitable smooth estimate of ∆(t). Approximating the SIT test statistic in terms of quadratic form and establishing the central limit theorem for the quadratic form (see de Jong (1987) ), the asymptotic distribution of the SIT statistic is derived under null hypothesis (i.e., the hypothesis described in (1.3)) and local alternatives. In this context, we would like to mention that there have been a few research articles on conditional quantiles of independent data, and the readers are referred to Zheng (1998) , Horowitz and Spokoiny (2002) , He and Zhu (2003) , Kim (2007) and a few references therein. However, none of the above research articles considered the more widely applicable hypothesis testing problems that we consider here (see (1.3)) for time varying quantile regression models (see (1.1)).

The second major contribution is to develop the simultaneous confidence band (SCB) for the difference between (m −1 1 (.)) and (m −1 2 (.)) , and an asymptotic property of the SCB is derived, which asserts the form of the simultaneous confidence band of the difference between (m −1 1 (.)) and (m −1 2 (.)) for a preassigned level of significance α ∈ (0, 1). Specifically, the 100(1 − α)% SCB of ∆(t), t ∈ I := (m 1 (0), m 1 (1 − d)), is (U α (t), L α (t)), where lim n→∞ P(L α (t) ≤ ∆(t) ≤ U α (t), for all t ∈Î n ) = (1 − α).

(1.4)

HereÎ n is a consistent estimate of interval I under null hypothesis, and (U α (t), L α (t)) depends on the sample size, which are obtained from the approximate formula for the maximum deviation of Gaussian processes (see Sun and Loader (1994) ) based on Weyl's volumes of tube formula. It is clear from (1.4) that one can use SCB as a graphical device like a band, and ∆(u) will be inside the band with a certain probability. Moreover, one can also estimate the type-I error and the power of the corresponding test associated with the SCB using the one-to-one correspondence between the confidence band and the testing of hypothesis. Earlier, Zhou (2010) derived the limiting correct simultaneous confidence bands of quantile curves for the dependent and locally stationary data when the covariates are fixed. For random design points, Wu and Zhou (2017) studied the limiting properties of simultaneous confidence bands of the corresponding functional considered in their article, which is different from the key term of our work.

The third major contribution is to propose a robust Bootstrap procedure to have a good finite sample performance of the SIT and the SCB tests. In principle, one can carry out the test based on SIT and construct the SCB using the results in Theorems 3.1 and 3.2. However, for small or moderate sample size, directly implementing those results may not produce satisfactory performance due to slow convergence rate, and to overcome this problem, the bootstrap method is proposed, and a better rate of convergence of the Bootstrap method is established as well. The readers may also look at Zhou and Wu (2010) and a few references therein.

It is now an appropriate place to mention that Dette et al. (2021) addressed an apparently similar looking hypothesis; however, this article studies an entirely different hypothesis, and the content is far different too. The differences are the following : Firstly, Dette et al. (2021) investigates the shift invariance in mean while our article studies shift invariance in quantiles. It should be pointed out that the structure of data in different quantiles can be heterogeneous for non-stationary time series data (see real data analysis), and therefore, our proposed methodology is able to capture richer features than the mean-based method in Dette et al. (2021) . Secondly, Dette et al. (2021) is restricted to examining the means of time series, while our method is applicable to regression quantiles with stochastic and dependent covariates. Finally, this article derives the SCB in addition to the SIT, which provides a quantitative rule for using the Graphical device in Dette et al. (2021) to determine the shift among curves.

The rest of the article is organized as follows. In Section 2, we characterize the null hypothesis stated in (1.3), which is a key observation in subsequent theoretical studies. 

We estimate time varying quantile regression coefficients θ τ,s (t) using the concept of local linear quantile estimators. Specifically, for s = 1 and 2, the local linear quantile estimate of (θ τ,s (t), θ τ,s (t)) is denoted by (θ τ,s,bn,s (t),θ τ,s,bn,s (t)), where (θ τ,s,bn,s (t),θ τ,s,bn,s (t)) = argmin

is a kernel function with K bn,s (·) = K · bn,s , and b n,s is the sequence of bandwidth associated with the s-th sample (s = 1 and 2). Note that the local linear (quantile) estimators have been extensively studied in the literature of non-parametric statistics for both independent and dependent data, see for example, Yu and Jones (1998) , Chaudhuri (1991) , Dette and Volgushev (2008) , Qu and Yoon (2015) , Wu and Zhou (2017) , Wu and Zhou (2018b) among many others. Among them, Wu and Zhou (2017) investigated the estimator (2.1) with locally stationary covariates and errors, and this locally stationary processes have been developed in the literature to model the slowly changing stochastic structure, which can be found in many real world time series data; see for instance, Dahlhaus (1997) , Zhou and Wu (2009) , Dette and Wu (2020) , Dahlhaus et al. (2019) . These articles motivated us to work on the hypothesis (1.3) assuming local stationarity.

We now estimate m 1 (t) and m 2 (t) through a biased-corrected estimate ofθ τ,s = (θ τ,s,1 , ...,θ τ,s,ps ) for s = 1 and 2. That iŝ

where for s = 1 and 2, and for 1 ≤ j ≤ p s ,

The superscript inside the parentheses denotes the bandwidth used for the corresponding τ,s,j (t) has a bias of the order O(b 2 n,s ), which is non-negligible and hard to evaluate. Therefore, the de-biased estimator has been widely applied in non-parametric inference, see for example Schucany and Sommers (1977) and Wu and Zhao (2007) . The superscript will be omitted in the rest of the article for the sake of notational simplicity.

Suppose that H is a smooth kernel function, h s (s = 1 and 2) is a sufficiently small bandwidth, and N is a sufficiently large number. We then estimate (m −1 1 ) (t) and (m −1 2 ) (t), which are denoted byĝ 1 (t) andĝ 2 (t), respectively :

Notice that N is not the sample size; it is used for Riemann approximation. Further, observe that

where 1(A) denotes the indicator function of set A. Therefore, the estimator defined in (2.4) is a smooth approximation to the step function ((m s ) −1 ) (t)1(m s (0) < t < m s (1)) and is differentiable with respect to t. Such type of estimator was proposed by Dette et al. (2006) and studied extensively by Dette and Wu (2019) for locally stationary time series models.

Now, usingĝ s (s = 1 and 2), one can estimate m −1 s (t) (s = 1 and 2) bŷ

This fact motivates us to estimate the horizontal shift d under null hypothesis as follows.

Note that for

and therefore,

This fact drives us to estimate d bŷ

whered =m −1 2 (m 1 (0)) is a preliminary estimator of d by letting u = m 1 (0) in (2.7). With thisd, one can therefore estimate the endpoints of intervals in (2.7), i.e., a := m 1 (0) and b := m 1 (1 − d). Letâ andb be the estimators of a and b, respectively, whereâ :=m 1 (0) andb :=m 1 (1 −d) under null hypothesis, and their properties under null and alternative are discussed in detail in Proposition D.3 of the supplementary material.

Next, to formulate the test statistic, we use the fact in Proposition 2.1 and propose the SIT and the SCB tests to check the hypothesis described in (1.3) based onĝ 1 (t) −ĝ 2 (t).

For the SIT test, the test statistics is defined as

(2.9) and η = η n 1 ,n 2 is a positive sequence that diminishes sufficiently slowly as n 1 , n 2 → ∞.

For instance, one may consider η n 1 ,n 2 vanishes at the rate of 1 log(n 1 +n 2 ) . The purpose of introducing η here is to avoid the issues related to the boundary points; for details, see remark D.1 of the supplementary material. Observe that T n 1 ,n 2 is an estimate of distance between (m −1 1 ) (t) and (m −1 2 ) (t) in L 2 sense, and we shall reject the null hypothesis when T n 1 ,n 2 is a large enough. The second test is the simultaneous confidence band centered aroundĝ 1 (t)−ĝ 2 (t), whose detailed expression is provided in the statement of Theorem 3.2.

Using the relation between the testing of hypothesis and the confidence band, it is easy to see that the SCB test is rejected at significance level α if the curve (

is not entirely contained by the 100(1 − α)% SCB.

Remark 2.1 We now discuss the relaxation of the monotonicity assumption of m 1 (t) and m 2 (t). Consider 0 = a 0 < a 1 < ... < a k+1 = 1 for some k > 0, where on each interval

Here max 1≤i≤k s i is called the maximal varnishing order of m 1 (t) and will be at least 1.

Then by the argument of (2.5),ĝ 1 (t) approximately equals with

(2.10)

Similarly,ĝ 2 (t) approximates g 2 (t) where g 2 (t) can be defined similarly to g 1 (t). By the decomposition (B.16) and (B.17) in the supplemental material and proof of Theorem 4.1 in Dette and Wu (2019) , we conjecture that our proposed Bootstrap tests (i.e., Algorithms 4.1 and 4.2) will be consistent for the hypothesis g 1 (t) = g 2 (t), with the rate of detectable local alternative adjusted by a function of h 1 , h 2 and the maximal varnishing orders of m 1 (t) and m 2 (t). Note that under the null hypothesis of shift invariance (1.3), g 1 (t) = g 2 (t). Therefore, if we exclude the pairs of curves belonged to the class F := {m 1 (t) = m 2 (t), g 1 (t) = g 2 (t)} from the alternatives, our proposed testing procedure is consistent and asymptotically correct. Notice that by Proposition 2.1, all pairs of monotone functions (m 1 (t), m 2 (t)) ∈ F. Simulation studies in Table 5 support this observation, while we leave the theoretical justification as a future work. On the other hand, to the best of our knowledge, there is no test of monotonicity for time series data in the literature.

In this section, we investigate the asymptotic properties of T n 1 ,n 2 and the asymptotic form of the SCB at a presumed significance level α. We start from a few concepts and assumptions for the model described in (1.2). Let (ζ

i ) i∈Z be i.i.d. random vectors, and the filtrations for s = 1 and 2 are the following:

We assume that the covariates and errors are both locally stationary process in the sense of Zhou and Wu (2009) , i.e., 

, and for any random vector v, write v q = (E(|v| q )) 1/q , which is its L q norm for some q ≥ 1. Let χ ∈ (0, 1) be a fixed constant, and suppose that M and η are sufficiently large and sufficiently small positive constants, respectively; though it may vary from line to line. For any positive semi-definite matrix Σ, write λ min (Σ) as its smallest eigenvalue. We first give out the following set of conditions, which enable us to study the deviation of the nonparametric quantile estimator,θ τ − θ τ .

(A1) Define θ τ,s (t) = (θ 1,τ,s (t), ..., θ ps,τ,s (t)) for s = 1 and 2. Assume that (θ i,τ,s (t), 1 ≤ i ≤ p s ) s=1,2 are Lipschitz continuous on [0, 1].

(A3) For the errors processes, we assume that for s = 1 and 2,

for a constant v ≥ 1.

(A4) For covariate processes, we assume that for s = 1 and 2, there exists a constant

(3.5) (A5) For conditional densities, we define for s = 1 and 2, and for 0 ≤ q ≤ 2p s + 1,

and assume that δ

(A6) Define for s = 1 and 2, conditional on G i,s , the conditional density and the quantile design matrix as

λ min (Σ s (t)) ≥ η > 0 and sup

be the left derivative of ρ τ (x). For s = 1 and 2, define the gradient vector process

Notice that by definition, U s ( i ns , F i,s , G i,s ) = ψ τ (e i,s )x i,s , which is the gradient vector. Now define the long run covariance matrices for U s , which is

Assume that for s = 1 and 2, there exists anη > 0, s.t. f s (i/n s , 0|G i )x i,s x i,s K bn,s (i/n s − t)/(n s b n,s ) converges to the non-degenerate quantile design matrix. Similar conditions are also assumed in Kim (2007) , Qu (2008) and a few references therein. Condition (A7) means that the long-run covariance matrices of the gradient vectors U s (t, F i,s , G i,s ) are non-degenerate. Condition (A8) is a mild condition for kernels, and the well known Epanechnikov and many more kernel functions satisfy the assumptions stated in (A8). Notice that conditions (A1)-(A7) generalize the conditions (A1)-(A5) of Wu and Zhou (2017) for multiple curves. We then consider a few more conditions (B) on the bandwidth and the regression function. The condition (B1) implies that in practice we should choose h s small, which was remarked by Dette et al. (2006) also. Further, (B1) ensures that our proposed estimatorsâ andb is well defined under alternative hypothesis, and (B2) means that d ∈ (0, 1) under null.

Condition (B3) guarantees that the nonparameteric estimatem s approximates well m s , s = 1, 2.

Before stating the main results on T n 1 ,n 2 and SCB, we introduce a few more notation.

Define for s = 1 and 2, T n,s = (b n,s , 1 − b n,s ) and M cs (t) = ((c s Σ −1 s (t))V 2 s (c s Σ −1 s (t)) ) 1/2 ,

Write m 2,1 (u) = m −1 2 (m 1 (u)),g 1,2 (m 1 (u)) =ǧ 1 (m 1 (u))ǧ 2 (m 1 (u))w(m 1 (u))m 1 (u) andV 12 (r) = R Rg 2 1,2 (m 1 (u)) RK (x)K (rm 21 (u)x + y)dx 2 dudy. For s = 1 and 2, leť

Notice that under the null hypothesis (1.3), m 2,1 (u) ≡ 1.

Theorem 3.1 Assume the conditions stated in (A1)-(A8) and (B1), (B2), (B3). Now,

for some bounded function κ(t), and ρ n := ρ n 1 ,n 2 = (n 1 b 5/2 n 1 ) −1/2 . We then have

Under the null hypothesis, κ ≡ 0. Therefore, Theorem 3.1 suggests to reject null hypothesis of (1.3) whenever

where α is the significance level, z 1−α is the (1 − α)-th quantile of a standard normal distribution,B 1 ,B 2 andV T are appropriate estimates of asymptotic bias parameters B 1 , B 2 and the asymptotic variance V T , respectively. Moreover, Theorem 3.1 shows that the SIT test is able to detect the alternative which converges to null at a rate of (n 1 b 5/2 n,1 ), with asymptotic power

where Φ(·) denotes the CDF of a standard normal random variable. Further, assume that for s = 1 and 2, b n,2 b n,1 → c b,2 ∈ (0, 1), n 2 n 1 → c n,2 ∈ (0, 1), h 2 h 1 → c h,2 ∈ (0, 1), η = o(1), η −1 = O(log(n 1 + n 2 )), and let c b,1 = c n,1 = c h,1 = 1. Define

(3.20)

2 ) (t) = ρ n 1 ,n 2 κ(t) for some non-zero bounded function κ(t) and ρ n 1 ,n 2 = o(η), as min(n 1 , n 2 ) → ∞, we have

where Iâ ,b = (â + η,b − η) and

Theorem 3.2 gives us the following simultaneous confidence band of (m −1 1 ) (t) − (m −1 2 ) (t):

K 1 (t) dt, andK 1 (t) andK 2 (t) are appropriate estimates of K 1 (t) and K 2 (t), respectively. Therefore we can reject the null hypothesis (1.3) at significance level α. Furthermore, it follows from condition (B) and (3.22) that the width of (3.23) is . Consequently, the SCB test is able to detect the alternative converging to null at a rate of

, which indicates the SIT test is asymptotically more powerful than the SCB test when bandwidths are of the same order. However, for moderately large sample size, the SCB test performs well when (m −1 1 ) (t) − (m −1 2 ) (t) is 'bumpy', or equivalently the majority part of two curves m 1 (t) and m 2 (t) are same up to the horizontal shift while minor parts of m 1 (t) and m 2 (t) have notably different shapes so that their differences cannot be eliminated by a horizontal shift.

4 Implementation of the tests

The implementation of the SIT test and the SCB test require the estimation of M cs (t).

For s = 1 and 2, letê i,

where the bandwidth w n,s is such that w n,s = o(1), n s w n,s → ∞, and φ(·) is the probability density function of the standard normal distribution. It has been shown in Theorem 6 in Wu and Zhou (2017) that with appropriate choices of w n,s ,Σ s (t) is a consistent estimator

(4.26)

With appropriate choices of bandwidth, Theorem 5 in Wu and Zhou (2017) shows that

Consequently, the estimator M cs (t) is a consistent estimator of M cs (t) under appropriate choices of M s and w n,s which will be discussed in the next section.

In this section, we first discuss the choices of the smoothing parameters, namely, b n,s and h s (s = 1 and 2) for calculating T n 1 ,n 2 . According to Dette et al. (2006) , when h s is sufficiently small, it has a negligible impact on the test, and therefore, by considering bandwidth conditions (B), we recommend choosing h s = n −1/3 s as a rule of thumb. For b n,s , we propose to choose this tuning parameter by a corrected-Generalized Cross Validation (C-GCV) method (see Craven and Wahba (1978) ). Notice that for the local linear regression with bandwidth b, the estimator can be written asŶ s = D s (b)Y s for some n s × n s matrix D s (b), and then the GCV selectŝ b n,s,mean = arg min

Following the arguments of Yu and Jones (1998) , it is appropriate to selectb n,s by correctinĝ b n,s,mean . First, we defineb o n,s = 2C sbn,s,mean with

is the errors process of local linear regression for s-th sample (s = 1 and 2), andΣ

. We refer to Zhou and Wu (2010) for the estimation ofM s (t). Then as recommended by Zhou (2010) , for the SIT test, we usê b n,s =b o n,s × n −1/45 s while for the SCB test, we useb n,s =b o n,s .

We now discuss the selection of w n,s and M s for the estimation of the quantityM cs (t)

in Section 4. As a rule of thumb, we propose to choose M s = n 1/3 s and select w n,s by minimum volatility method. Specifically, consider a grid of possibly w n,s : {w s,1 , ..., w s,k }.

Together with M s and b n,s , one can calculateM c s,1 , ...,M c s,k using {w s,1 , ..., w s,k }, respectively. Then, for a positive integer u (u = 5 say), define

(4.31)

Now, let l be the minimizer of ise(M cs,u (l)), and we select w s,l + u/2 as w n,s . The validity of these methods for choosing w n,s and M s are given in Wu and Zhou (2017) , which also proposed methods of tuning parameters for refinement. For simplicity, we omit the detailed description of the tuning procedure for refinement in our paper. Our empirical study finds that our choices of tuning parameters b n,s , h s , M s , w n,s and the estimate of M cs (t) work reasonably well.

Let {V j,1 , j ∈ Z} and {V j,2 , j ∈ Z} be i.i.d. standard normal random variables. Theorems 3.1 and 3.2 are built on the fact that the distribution ofĝ 1 (t) −ĝ 2 (t) can be well approximated by

, and for s = 1 and 2,

The limiting distribution is established by the asymptotic limit of quadratic form of the Gaussian process Z(t, {V j,s } j∈Z,s=1,2 ) for Theorem 3.1, and the convergence of extreme values of Z(t, {V j,s } j∈Z,s=1,2 ) for Theorem 3.2. However, the direct implementation of Theorem 3.1 and Theorem 3.2 is difficult. The former involves a complicated bias term of the order (b −1/2 n,s ) to be estimated, and the latter has a slow convergence rate O 1 √ log ns , which follows from the proof of Theorem 3.2. To circumvent this difficulty, we propose the following Bootstrap-assisted algorithm based on Z(t, {V j,s } j∈Z,s=1,2 ). to obtain the statistic

The p-value of this test is given by 1 − Q * /Q, where Q * = max{r : M (r) ≤ T n 1 ,n 2 }.

(a) Estimate m 1 and m 2 , a, b and M c 1 (·), M c 2 (·).

to obtain the statistic

(4.33)

By applying Algorithm 4.1, there is no need to estimate the bias term b

1B 2 )) as well as the asymptotic variance V T . The validity of these algorithms are based on the approximation of Z(t, {V j,s } j∈Z,s=1,2 ) toĝ 1 (t) −ĝ 2 (t) (see (2.4) for the expressions of g 1 (t) andĝ 2 (t)), which is discussed in detail in the proof of Theorem 3.1. Notice that the cutoff values M ( Q(1−α) ) andM ( Q(1−α) ) are obtained for fixed n 1 and n 2 , while the critical values in Theorem 3.1 and Theorem 3.2 are based on the limiting distribution. Therefore, similar to Zhao and Wu (2008) , we expect that Algorithms 4.1 and 4.2 will outperform the test using the critical values of Theorems 3.1 and 3.2. Finally, to implement Algorithm 4.2, we need to estimate K 1 , which consists of the estimate of m −1 s , (m −1 s ) and m s for s = 1 and 2. We suggest to estimate these quantities by t ms(0)ĝ s (u)du,ĝ s (u), and c sθ τ,s (t) for s = 1 and 2, respectively, whereθ τ,s (t) = (θ τ,s,1 (t), . . . ,θ τ,s,p (t)) with its j th element

Algorithms 4.1 and 4.2 are built upon the assumption that two data setsỹ i,1 := (y i,1 , x i,1 ) andỹ i,2 := (y i,2 , x i,2 ) are independent of each other. As pointed out by a referee, it is important to allow the dependence among the two data sets. For this purpose, we should model the two seriesỹ i,1 andỹ i,2 jointly assuming that they are generated from certain p 1 + p 2 + 2 dimensional vector process ζ(t) t∈[0,1] . The two vectorsỹ i,1 andỹ i,2 correspond to the first p 1 + 1 and the next p 2 + 1 components of the ζ(t), respectively, but at possibly different time points. For instance, ifỹ i,2 is collected at a subsequent period of wheñ y i,1 , then one could assume thatỹ i,1 is realized from ζ(t) t∈T 1 andỹ i,2 is realized from

On the other hand, if the two series are both collected in the same period, when the realization time of {ỹ i,1 } and {ỹ i,2 } are distinct, the test and the asymptotic properties will be different from when most of the observation of the two series are generated at the same time points. Therefore, an exhaustive discussion of our tests for the two mutual dependent series is prohibitive due to the page limit. In this section, we focus on the following scenario. Let n 1 ≥ n 2 , andỹ i,1 are realized at

.

Suppose that M c 21 (t) = M c 12 (t) and redefine two Gaussian processes Z s (t, {V j,s } j∈Z ), s = 1, 2 in Section 4.3 with respect to i.i.d. standard normals {V j,s } j∈Z,s=1,2 as

The following theorem describes the asymptotic properties.

Theorem 5.1 Consider the two possibly correlated time seriesỹ i,1 andỹ i,2 with sample sizes n 1 ≥ n 2 , whereỹ i,1 is realized at time i/n 1 for 1 ≤ i ≤ n 1 andỹ i,2 is realized at in 1 /n 2 n 1 for 1 ≤ i ≤ n 2 . Then we have: To implement Theorem 5.1, we shall estimate the quantity M c (t) in (5.36), which consists of Σ s (t) for s = 1, 2 and V(t) of (5.35). The former can be estimated via (4.24), and the latter can be estimated by (say n 1 ≥ n 2 )

, and for s = 1, 2

where M is the window size such that M → ∞ and M = o(n 2 ). In the supplemental material, we proves Theorem 5.1 via carefully showing the asymptotic convergence and calculating the asymptotic formula of T n 1 ,n 2 and (m −1

) under the new conditions of Theorem 5.1. In particular, we show that when the two series are correlated, under null hypothesis of shift invariance, the asymptotic results of the above two statistics will be very different between the scenarios of d = 0 and d = 0, while the new algorithms in Theorem 5.1 are adaptive to the two scenarios, i.e., the asymptomatic correctness of the algorithms hold under both scenarios.

This section studies the finite sample performance of the SIT test and the SCB test. The performance of the tests is carried out for n 1 = n 2 = n = 50, 100, 200 and 500, the number of repetitions = 1000 and the number of Bootstrap replication (i.e., B) = 500. In this study, we consider the Epanechnikov kernel (e.g., see Silverman (1998)) unless mentioned otherwise, and the upper limit of the Riemann sum is the same as the sample size, i.e., N = n. Apart from these choices, we set h s = n −1/3 s and choose b n,1 and b n,2 as described in Section 4.2.

The covariate random variables and the error random variables are generated as follows.

Let Q τ (·) be the τ th quantile of a random variable. For s = 1 and 2, consider the p s dimensional covariate x i,s and the error e i,s :

where G i,s = (ζ −∞,s , ..., ζ i,s ) and ζ i,s = (ζ i,1,s , ..., ζ i,ps,s ) are independent random vectors,

,2 ) are jointly independent random variables with {ζ i,j,s } i∈Z,s=1,2 following χ 2 j /j, where χ 2 j denotes a χ 2 distribution with j degrees of freedom. The innovations {ε k,1 } k∈Z follow standard normal distribution, and {ε k,2 } k∈Z follow t 5 / 5/3, where t 5 denotes the standardized student t distribution with 5 degrees of freedom.

The nonlinear filters L and H are defined as follows. For e i ,

Let us denote H s = (H 1,s , ..., H ps,s ) . For s = 1 and 2 and j-th covariate, 1 ≤ j ≤ p s ,

Example 1: Let

Further, consider c 1,1 = c 2,1 = c 1,2 = c 2,2 = 1. In the numerical studies, we consider τ = 0.5, 0.7 and 0.8.

Example 2: Let

Suppose that θ 1,τ,1 (t) = t 2 , θ 2,τ,1 (t) = sin πt 2 , θ 3,τ,1 (t) = e t , and θ 1,τ,2 (t) = (t − 0.1) 2 , θ 2,τ,2 (t) = sin π(t−0.1) 2 and θ 3,τ,2 (t) = e t−0.1 . Further, consider c 1,1 = c 2,1 = c 3,1 = c 1,2 = c 2,2 = c 3,2 = 1, and here also, τ = 0.5, 0.7 and 0.8 are considered in the numerical study.

Note that for both Examples 1 and 2, the choices of time varying coefficients (i.e, θ(t)'s) satisfy the null hypothesis described in (1.3). Tables 1 and 2 show the rejection probabilities of the SIT test and the SCB test for Examples 1 and 2, respectively when the level of significance is 5% and 10%. Table 1 : The estimated size of the SIT test for different sample sizes n 1 = n 2 = n. The levels of significance (denoted as α) are 5% and 10%.

For power study, we consider the the same error and covariate processes and used the following examples :

Example 3: Let and

Suppose that θ 1,τ,1 (t) = t, θ 2,τ,1 (t) = log t, and θ 1,τ,2 (t) = t 2 , θ 2,τ,2 (t) = (log t) 2 . Further, consider c 1,1 = c 2,1 = c 1,2 = c 2,2 = 1. In the numerical studies, we consider τ = 0.5, 0.7 and 0.8.

Example 4: Let

Suppose that θ 1,τ,1 (t) = t 2 , θ 2,τ,1 (t) = sin πt 2 , θ 3,τ,1 (t) = e t , and θ 1,τ,2 (t) = t 3 , θ 2,τ,2 (t) = cos πt 2 and θ 3,τ,2 (t) = log t. Further, consider c 1,1 = c 2,1 = c 3,1 = c 1,2 = c 2,2 = c 3,2 = 1, and here also, τ = 0.5, 0.7 and 0.8 are considered in the numerical study. Table 3 : The estimated power of the test of the test based on T n 1 ,n 2 , i.e., the SIT test for different sample sizes n 1 = n 2 = n. The levels of significance (denoted as α) are 5% and 10%. Table 4 : The estimated power of the SCB test for different sample sizes n 1 = n 2 = n. The levels of significance (denoted as α) are 5% and 10%.

It follows from the results of Examples 1 and 2 that the test based on T n 1 ,n 2 , i.e., the SIT test and the SCB test can achieve the nominal level of significance when τ = 0.5, 0.7 and 0.8. In terms of estimated power, the results of Examples 3 and 4 indicate that the SIT and the SCB tests can achieve the maximum power as the sample size increases.

Precisely speaking, for Example 3, the SIT test is marginally more powerful than the SCB test whereas for Example 4, the SCB test is faintly more powerful than the SIT test.

We also observe the same phenomena for unequal n 1 and n 2 but for the sake of concise presentation, we have not here reported the values of the estimated size and power.

At the end, as one reviewer pointed out the issue, we want to discuss the performance of the SIT and the SCB tests when m 1 (t) and m 2 (t) are non-monotone. Let us consider the following two examples, and the results are summarized in Table 5 .

Example NM1: Let

consider c 1,1 = c 1,2 = 1. In the numerical studies, we consider τ = 0.5.

Example NM2: Let y 1,i = θ 1,τ,1 i n x 1,i,1 + e i,τ,1 and y 2,i = θ 1,τ,2 i n x 1,i,2 + e i,τ,2 .

Suppose that θ 1,τ,1 (t) = t(0.5 − t)(1 − t), and θ 1,τ,2 (t) = t(0.5 − t) 2 (1 − t). Further, consider c 1,1 = c 1,2 = 1. In the numerical studies, we consider τ = 0.5.

Note that in both Example NM1 and Example NM2, θ 1,τ,1 (t) is a non-monotone function of t ∈ (0, 1 

The study in Section 4.2 indicates that the performance of the SIT and the SCB tests depends on various tuning parameters, namely, b n,s , h s , w n,s and M s . In this section, we carry out the power and the level study for various choices of b n,s (i.e., bandwidth) and Tables 8 and 9 indicate that the power does not vary more than 4% for various choices of b n,s and w n,s .

It is already observed that the SIT and the SCB tests perform well for two independent data sets. In this section, we investigate the finite sample of performance of the SIT and the SCB tests when the two compared data sets are dependent as it often happens in practice. Here we consider the same models as in Examples 1, 2, 3 and 4 considered in Section 6 but we generated the data in a different way so that data sets become dependent.

Strictly speaking, we generate dependent errors in the following way: e i = (e i,1 , e i,2 ) = 0.8L 1 i n , F i,1 + 0.2L 2 i n , F i,2 , 0.2L 1 i n , F i,1 + 0.8L 2 i n , F i,2 , where all notation are the same as defined at the beginning of Section 6, and the covariates are generated in the same way as we described in Section 6.

In this study, we choose the same set of the tuning parameters and the sample sizes (i.e., n 1 and n 2 ) as they are considered in Section 6.1. All results are reported in Section F the supplementary materials. It follows from those results that when the models are the same as the models in Examples 1 and 2, i.e., the null hypothesis is true, the estimated sizes of the SIT test and the SCB test based on Theorem 5.1 are not deviated more than 1% from the estimated sizes when the data sets are mutually independent. Next, when the models are the same as the models in Examples 3 and 4, i.e., the alternative hypothesis is true, the estimated powers of the SIT test and the SCB test based on Theorem 5.1 are not deviated more than 6% from the estimated powers when the data sets are mutually independent. 

This data set consists of two variables, namely, the cumulative number of infected cases and the cumulative number of deaths due to COVID-19 outbreak in a particular country for the period from December 31, 2019 to October 7, 2020, i.e., n = 282 days. We here consider two countries, namely, France and Germany as they are from the same continent. Our analysis is based on the log transformed data since the data is varying from small values to quite large values. The data set is available at https://ourworldindata. org/coronavirus-source-data. The analysis has three parts, namely, (A) Analysis of cumulative infected cases and deaths in France due to COVID-19 outbreak, (B) Analysis of cumulative infected cases in France and Germany due to COVID-19 outbreak and (C) Analysis of cumulative deaths in France and Germany due to COVID-19 outbreak. All three analyses are done for τ equals to 0.8, 0.5 and 0.2 . In order to implement our Table 9 : The estimated power of the SCB test for different sample sizes n 1 = n 2 = n. The level of significance (denoted as α) is 5%. In each cell, from the left, the first, the second and the third values are corresponding to w n,s = n 

Let us first discuss a few observations. The left diagram in Figure 1 indicates that both cumulative infected cases and deaths are increasing over time in France, which is also expected as the new cases are added to the data everyday. In fact, it is observed in the right diagram in Figure 1 that the quantile curves of the cumulative infected cases and the deaths have an increasing trend over time. First, we now implement the SIT and the SCB tests on the full data, and the tests are carried out using the procedure explained in 

First observe that the left diagram in Figure 3 indicates that the cumulative death cases in both France and Germany are increasing over time, and it is also expected as new cases are added to the data every day. In addition, it is observed in the right diagram in 

This data set consists of four variables: average temperature anomaly, the carbon emission in the form of gas, solid and liquid. We consider two regions, namely, the northern hemisphere and the southern hemisphere since the feature of average temperature anomaly and the carbon emission in the form of gas, solid and liquid are different in two hemispheres, and they are monotonically increasing over time which causes interest of study in climate science (see, e.g., Raupach et al. (2014) ). The data set for these two regions of the aforementioned four variables are available in https://ourworldindata.org/ co2-and-other-greenhouse-gas-emissions and https://cdiac.ess-dive.lbl.gov/ trends/emis/glo_2014.html. These yearly data sets reported the values of the variables for the period from 1850 to 2018, i.e., n = 169. In this study, the average temperature anomaly is considered as the response variable (denoted as y), and the carbon emission in the form of gas (denoted as x 1 ), liquid (denoted as x 2 ) and solid (denoted as x 3 ) are the covariates.

We then discuss a few more observations on this data. The diagrams in Figure 4 indicate that for both northern and southern hemispheres, y, x 1 , x 2 and x 3 increase over time, which is a well-known feature in climate science. Moreover, we observe from Figure 5 that the fitted quantile coefficient curves associated with x 1 , x 2 and x 3 are monotonically increasing over time for a given quantile (in Figure 5) . We now investigate the performance of the test based on T n 1 ,n 2 (here n 1 = n 2 = n = 169) to check 

year Metric Tons Figure 4 : Plots of temperature anomaly, carbon (gas, liquid and solid) emission in northern and southern hemispheres. In each diagram, the line curve represents the northern hemisphere, and the dashed curve represents the southern hemisphere. Quantile coeifficent curve,τ=0.5

year Fitted quantile Figure 5 : The plots of fitted quantile coefficients curves associated with x 1 , i.e., gas (left diagram), x 2 , i.e., liquid (middle diagram) and x 3 , i.e., solid (right diagram). In each diagram, the line curve represents for the northern hemisphere, and the dotted curve represents for the southern hemisphere. The red curve corresponds to τ = 0.8, the black curve corresponds to τ = 0.5, and the blue curve corresponds to τ = 0.2.

whether this data favours H 0 (see (1.3)) or not when c = (c 1,1 , c 2,1 , c 3,1 , c 1,2 , c 2,2 , c 3,2 ) equals (1, 0, 0, 1, 0, 0) and (0, 0, 1, 0, 0, 1), and the test is carried out following the pro- and using the approximation formula in Proposition 1 of Sun and Loader (1994) , we derive the proof of Theorem 3.2, which obtains the simultaneous confidence band. Intensive calculations are provided in our proof to determine the parameters in the approximation formula of Sun and Loader (1994) .

B Proofs of Theorem 3.1 and 3.2.

B.1 Proof of Theorem 3.1

Proof. Writẽ

In the following, we shall prove that, under conditions of Theorem 3.1,

n,1 (T n 1 ,n 2 − T n 1 ,n 2 ) = o p (1).

Proof of (B.3):

. Then we have the following decomposition:

Using the similar argument of page 471 of Dette et al. (2006) , we have

Next, by Taylor series expansion, we have that for s = 1 and 2, the following decomposition holds:

for some ν * s ∈ [−1, 1] (s = 1 and 2). Notice that log 4/3 ns √ random variables (V s,i ) i∈Z , s = 1 and 2 such thatT n 1 ,n 2 can be written as

and sup

As

to prove (B.3), it is sufficient to show that

Proof of (B.14):

We decompose Z(t) by Z(t) := Z 1 (t) − Z 2 (t). Here for s = 1 and 2, we have

As a result, we have

W s (m s , j, t)V j,s 2 w(t)dt, s = 1 and 2, (B.19)

We first prove the results for A 1 , and the result for A 2 can be evaluated in a similar way. Notice that where R(j, s, n, t) = 1

for a sufficiently large constant M . Since H is chosen to be symmetric, we have R H (x)dx = 0. Therefore, by Taylor series expansion, for t with w(t) = 0, it follows that for s = 1 and 2, the leading term of W s (m s , j, t) can be written asW s (m s , j, t)(1 + O(b n,s + hs bn,s )), wherẽ

Next, by using (B.25), we have

On the other hand, the similar calculations show that

Then for A 1,b , we have that due to symmetry of the R W 1 (m 1 , j 1 , t)W 1 (m 1 , j 2 , t)w(t)dt in j 1 and j 2 ,

Now, by changing variable, i.e., letting u−m −1 1 (t) b n,1 = x, and using the fact thatg 1 (m 1 (u)) = ǧ 2 1 (m 1 (u))w(m 1 (u))m 1 (u), we have (1)). (1)).

On the other hand, for A 12 , we have

By change of variable using x = (u − m −1 1 (t))/b n,1 , we have that

Therefore, we have that 

Proof. We prove the theorem in two steps.

Step 1: Recall Z(t) defined in (B.11) in the proof of Theorem 3.1 . We first evaluate E(Z 2 (t)), E(Z 2 (t)) and E(Z(t)Z (t)). Since h s = o(b n,s ), uniformly for t ∈ [a + η, b − η], we have

Finally, by the symmetry of K and H, we have

Step 2: We use Proposition 1 of Sun and Loader (1994) to evaluate the maximum deviation of Z(t). For any two p-dimensional vectors u = (u 1 , ...u p ) and v = (v 1 , ...., v p ) , write | < T(t), V > |. Therefore, by Proposition 1 of Sun and Loader (1994) , we have that

By using (B.42)-(B.44) and the results of E(Z 2 (t)), E(Z 2 (t)) and E(Z(t)Z (t)) from

Step

Furthermore, by(D.54), we have 

In this subsection we prove Proposition 2.1 in the main article, which enables us to test the null hypothesis H 0 in (1.3) via investigating (m −1 1 (u)) − (m −1 2 (u)) .

Proof of Proposition 2.1: We extend the proof of Lemma 2.1 in Dette et al. (2021) .

If m 1 (t) = m 2 (t + d) for 0 < t < 1 − d for some unknown d, and m 1 (t) and m 2 (t) are monotonically increasing for 0 < t < 1 − d, one then can write u = m 1 (t) = m 2 (t + d) for m 1 (0) < u < m 1 (1 − d), which implies that t = m −1 (u) and t + d = m −1 2 (u). Therefore,

Since d is a constant, we have proven that (1.3) implies (C.48).

On the other hand, by (C.48), one can see that for any t ∈ (m 1 (0),

As a result, we have m −1 1 (t) = m −1 2 (t) − m −1 2 (m 1 (0)), and d = m −1 2 (m 1 (0)). Therefore, by rearranging the equation and taking m 2 on both sides of it, one can conclude that where Θ n,s is defined in the main article above condition (B1).

Proof. The proposition follows immediately from equation (59) Proof. It follows from Lemma 1 of Zhou and Wu (2010) 

Then the assertion of this proposition follows from Proposition D.1.

The following proposition provides the convergence rate ofâ andb under the null and the local alternatives. . Then if (m −1 1 ) (t) − (m −1 2 ) (t) = ρ n 1 ,n 2 κ(t) for some non-zero bounded function κ(t) and ρ n 1 ,n 2 = o(η), we have that

Notice that under null, the LHS of (D.54) will be reduced to max(|â−a|, |b−b|). Moreover, ii) lim n 1 →∞,n 2 →∞ P(m −1 s (t) ∈ (b n , 1 − b n ), s = 1, 2, for all t ∈ (â + c 1 η,b − c 2 η)) = 1.

(D.55)

for any given positive constant c 1 , c 2 > 0.

Remark D.1 Note that i) in Proposition D.3 shows thatŵ(t) in (2.9) is consistent under null hypothesis and local alternatives. Further, observe that ii) in Proposition D.3 shows that by introducing η, we avoid bandwidth conditions since ii) excludes regions where m −1 s (t) (s = 1 and 2) are close to 0 and 1.

Proof of Proposition D.3: Notice that (m 2 ) −1 (t) − (m 1 ) −1 (t) = ρ n 1 ,n 2 κ(t) implies that Now, since b n 1 = o(η), and m −1 1 is differentiable, using mean value theorem with probability tending to 1, we have (D.59) and hence, lim n→∞ P(b n 1 < m −1 1 (â + c 1 η)) = 1. Arguing in a similar way, one can establish that lim n→∞ P(1 − b n 1 > m −1 1 (b − c 2 η)) = 1. Then (ii) holds since the function m −1 (·) is monotone.

In this section, we assume that n 1 ≥ n 2 , 1 ≤ n 1 /n 2 ≤ M < ∞ for some large constant M .

Here we stick to the scenario that two series are both collected in the same period, i.e., y i,1 is realized at time i n 1 for 1 ≤ i ≤ n 1 , andỹ i,2 is realized at time in 1 /n 2 n 1 for 1 ≤ i ≤ n 2 . 

Proof. By Lemma 7 of Wu and Zhou (2017) 

(n s b n,s ) −1/2 π n,s ),

where π n,s is defined in Proposition D.1. By Proposition 3 in Wu and Zhou (2017) , for any s × p (s ≤ p) matrix C, we have on a possibly richer probability space, there exists an i.i.d. sequence of normal N 0,

with ν C (t) = (CV 2 (t)C ) 1/2 . Now, define the index set A = { i n 2 n 1 , 1 ≤ i ≤ n 2 } and observe that for any bandwidth b n , we have

(E.64)

Let us now consider 

where K(i, t) = K b n,1 (i/n 1 − t)/(nb n,1 )K b n,2 (i/n 1 − t)/(nb n,2 )1(i ∈ A)

.

Straightforward calculation shows that (E.68) implies both the two assertions of the proposition.

Proposition E.2 Let ρ n = o(1), and assume for t ∈ [a, b], (m −1 1 − m −1 2 ) (t) = ρ n κ(t) for some bounded function κ(t), where a = m 1 (0) ∧ m 2 (0) and b = m 1 (1) ∨ m 2 (1). Then for any u ∈ [m 1 (0) ∧ m 2 (0), m 1 (1) ∨ m 1 (1)], we have

In particular,

Notice that Under H 0 m 1 (0) = m 2 (d), so m −1 2 (m 1 (0)) = d, and by the fact that (m −1 1 − m −1 2 ) (t) = ρ n κ(t), the proposition follows.

To prove the assertion in Theorem 5.1, we shall study the asymptotic behavior of T n 1 ,n 2 andm −1 1 −m −2 (t) under the case that the two time series are correlated. The results are presented and proved in Theorem E.1 and E.2 below, respectively.

Let us first consider the following variables:

Define the following quantities with r, a, b, c functions of u, a * , b * ∈ R, E 1 = K 2 (u)du ǧ 2 11 (m 1 (x))w(m 1 (x))m 1 (x)dx, (E.77) C 1 = (K K (z)) 2 dz (ǧ 2 11 (m 1 (x))w(m 1 (x))m 1 (x)) 2 dx (E.78)

Further define for a, b, c functions of u, z ∈ R, I(z, a, b, c) = dudy[w(m 1 (u))m 1 (u)] 2 I 2 (z, u, a, b, c, y)dudy, where I(z, u, a, b, c, y) = zǧ 11 (m 1 (u))K (x) −ǧ 21 (m 1 (u))K (a(u) + b(u)x)

× ǧ 21 (m 1 (u))K (a(u + b(u)x + c(u)y)) − zǧ 12 (m 1 (u))K (y + x) dx.

II(z, a, b) = z 2ǧ2 12 (m 1 (u))[w(m 1 (u))m 1 (u)] 2 II 2 (z, u, a, b, c, y)dudy, where II(z, u, a, b, y) = zǧ 11 (m 1 (u))K (x) −ǧ 21 (m 1 (u))K a + bx K (y + x)dx.

III(z, a, b) = c 2 2,1,nǧ 2 11 (m 1 (u))[w(m 1 (u))m 1 (u)] 2 III 2 (z, u, a, b, y)dudy, where III(z, u, a, b, y) = − zǧ 12 (m 1 (u))K (x) +ǧ 22 (m 1 (u))K a + bx K (y + x)dx.

Theorem E.1 Assume the conditions stated in (A1)-(A6), (A7'), (A8) and (B1), (B2), (B3), and that as n → ∞, n 1 /n 2 → n 1,2 ∈ (0, ∞), b n 1 /b n 2 → b 1,2 ∈ (0, ∞), n 1,2 = 1/n 2,1 , b 1,2 = 1/b 2,1 , η −1 = O(log(n 1 + n 2 )), η = o(1). Further, let (m −1 1 ) (t) − (m −1 2 ) (t) = ρ n κ(t) for some bounded function κ(t), and ρ n := ρ n 1 ,n 2 = [(n 1 b 5/2 n 1 ) −1/2 (n 2 b 5/2 n 2 ) −1/2 ] −1/2 . Assuming where, for instance,Ẽ 1−2 2 = K 2 (u)du ǧ 2 12 (m 1 (x))w(m 1 (x))m 1 (x)dx. (E.88)

Further, definẽ V = 2C 1 n 2,1 b 5/2 2,1 + 2C 2 n 1,2 b 5/2 1,2 + 4n 2,1 b 3/2 1,2C 32 (0, b 1,2 , b 1,2 ) + 4n 2,1 b 3/2 1,2 D 12 (0, b 1,2 , b 1,2 ) −8n 2,1 b 1/2 2,1 D 13 (0, b 1,2 , b 1,2 ) − 8b 1/2 2,1 D 2,3 (0, b 2,1 , b 2,1 ), (E.89) W = n 1,2 b 11/2 1,2Ĩ (n 2,1 b 2 2,1 , 0, b 1,2 , b 1,2 ) − (n 2 1,2 − n 1,2 )b 11/2 1,2Ĩ I(n 2,1 b 2 2,1 , 0, b 1,2 ) + (n 2 1,2 − n 1,2 )b 11/2 1,2Ĩ II(n 2,1 b 2 2,1 , 0, b 1,2 ) − n 2,1 (1 − n 2,1 ) 2 b 5,2 2,1Ĩ V . 1,2Ĩ (n 2,1 b 2 2,1 , ∞, b 1,2 , b 1,2 ) − (n 2 1,2 − n 1,2 )b 11/2 1,2Ĩ I(n 2,1 b 2 2,1 , ∞, b 1,2 ) + (n 2 1,2 − n 1,2 )b 11/2 1,2Ĩ II(n 2,1 b 2 2,1 , ∞, b 1,2 ) − n 2,1 (1 − n 2,1 ) 2 b 5,2 2,1Ĩ V .

Proof. Notice that Propositions 2.1 and D.3 still hold. Therefore, following the proof of Theorem 3.1, we shall see that (n 1 n 2 b 5/2 n,1 b 5/2 n,2 ) 1/2 (T n 1 ,n 2 − T n 1 ,n 2 ) = o p (1). (E.95)

HereT n 1 ,n 2 is defined in (B.2), and applying Proposition E.1 instead of Proposition D.1, we shall see that the decomposition ofT n 1 ,n 2 in (B.10) can be written as

where Z(t) is defined in (B.11) replaced by

Here

For the remaining term R(t), it satisfies (B.12). Again following the proof of Theorem 3.1, we shall see that (n 1 n 2 b 5/2

In the following, we shall show that

for m −1 2 (m 1 (0)) = 0, and (n 1 b 5/2 n,1 n 2 b 5/2 n,2 ) 1/2 R Z 2 (t)w(t)dt − (b n,1 b n,2 ) −1/4 (B +B 1−2 ) ⇒ N (0,Ṽ +Ṽ 1−2 + 4W ).

(E.100)

for |m −1 2 (m 1 (0))| > 0.

We now calculate the asymptotic variance and bias. To ease the notation, write

Similarly to the proof of Theorem 3.1, W uv (j, t) can be written asW uv (j, t)(1 + O(b n,u + hu bn,u ))) for u, v = 1, 2, wherẽ

since for notational simplicity we have defined M c 12 (t) = M c 21 (t).

DefineZ 1 (t) = Z 11 (t) − Z 21 (t) andZ 2 (t) = Z 22 (t) − Z 12 (t), so

W 21 (j 2 , t)V n 1 j 2 /n 2 ,1 w(t)dt := A + B − 2C,

where A, B and C are defined in an obvious manner. Following the proof of Theorem 3.1, we shall see that

) (E.110) When j 1 = j 3 , j 2 = j 4 and j 1 = n 1 j/n 2 , we have

)w(t)dt] 2 dudv(1 + o(1)) = 1 n 1 n 2 b 2 n,1 b 3 n,2

[ K (x)K (y + m 21 (u) b n,1 x b n,2 )dx] 2 × [ǧ 11 (m 1 (u))ǧ 21 (m 1 (u))w(m 1 (u))m 1 (u)] 2 dudy(1 + o(1)).

Similarly, since W 21 ( n 1 j 4 n 2 , j 2 ) = 1 n 1 n 2 b 2 n,1 b 2 n,2 ǧ 11 (t)ǧ 21 (t)K ( j 4 /n 2 − m −1 1 (t) b n,1 ) K ( j 2 /n 2 − m −1 2 (t) b n,2 )w(t)dt(1 + O( 2 s=1 (b n,s + h s b n,s ))) (E.112) and similarly expression holds for W 21 ( n 1 j 2 n 2 , j 4 ), we have

W 21 (j 1 , j 2 )V j 1 ,1 V n 1 j 2 n 2 W 21 (j 3 , j 4 )V j 3 ,1 V n 1 j 4 n 2 1(j 1 = n 1 j 4 n 2 , j 3 = n 1 j 2 n 2 , j 1 = n 1 j 2 n 2 ) = 1 n 2 1 b 4 n,1 b 4 n,2

)w(t)dt dudv(1 + o(1)) = 1 n 2 1 b n,1 b 4 n,2

(ǧ 11 (m 1 (u))ǧ 22 (m 1 (u))w(m 1 (u))m 1 (u)) 2 × K (x)K ( u − m 21 (u) b n,2 + b n 1 b n 2 y + m 21 (u) b n,1 b n,2 x)dx W 21 (j 1 , j 2 )V j 1 ,1 V n 1 j 2 n 2 W 21 (j 3 , j 4 )V j 3 ,1 V n 1 j 4 n 2 1(j 1 = n 1 j 4 n 2 , j 3 = n 1 j 2 n 2 = j 1 ) Using similar argument, we obtain hand, notice that Z 2 (t)w(t)dt = (Z 2 1 (t) +Z 2 2 (t) − 2Z 1 (t)Z 2 (t))w(t)dt.

It is obvious that E Z 1 (t)Z 2 (t)w(t)dt = 0, Cov( Z 2 1 (t)w(t)dt, Z 2 2 (t)w(t)dt) = 0, Cov( Z 2 1 (t)w(t)dt, Z 1 (t)Z 2 (t)w(t)dt) = 0, and Cov( Z 2 2 (t)w(t)dt, Z 1 (t)Z 2 (t)w(t)dt) = 0.

It remains to calculate V ar( Z 1 (t)Z 2 (t)w(t)dt). Without loss generality consider n 1 /n 2 = k ∈ Z where k ≥ 1. Then, we havẽ

[W 11 (j, t) − W 21 (j/k)1(j/k ∈ Z)]V j,1 := n 1 j=1W 1 (j, t)V j,1 , Z 2 (t) = n 1 j 1 =1

[W 22 (j/k, t)1(j/k ∈ Z) − W 12 (j, t)]V j,2 := n 1 j=1W 2 (j, t)V j,2 , whereW 1 (j, t) andW 2 (j, t) are defined in an obvious way. Further, definẽ W 11 (j, t) = W 11 (j, t) − W 21 (j/k, t), (E.122) W 22 (j, t) = W 22 (j/k, t) − W 12 (j, t).

(E.123)

Let us now define

W 1 (j 1 , t)W 2 (j 2 , t)w(t)dtV j 1 ,1 V j 2 ,1 . [ W 1 (j 1 , t)W 2 (j 2 , t)w(t)dt] 2 = n 1 j 1 =1 n 1 j 2 =1 W 11 (j 1 , t)1(j 1 /k ∈ Z) + W 11 (j 1 , t)1(j 1 /k ∈ Z)

W 22 (j 2 , t)1(j 2 /k ∈ Z) − W 12 (j 2 , t)1(j 1 /k ∈ Z)) w(t)dt W 11 (j 1 , t)W 22 (j 2 , t)1(j 1 /k ∈ Z)1(j 2 /k ∈ Z)w(t)dt 2 , (E.125)

W 11 (j 1 , t)W 12 (j 2 , t)1(j 1 /k ∈ Z)1(j 2 /k ∈ Z)w(t)dt W 11 (j 1 , t)W 22 (j 2 , t)1(j 1 /k ∈ Z)1(j 2 /k ∈ Z)w(t)dt 2 , (E.127)

W 11 (j 1 , t)W 12 (j 2 , t)1(j 1 /k ∈ Z)1(j 2 /k ∈ Z)w(t)dt For II, by Riemann sum approximation, we see that II = (k − 1)n 2 1 k 2 (n 2 b 2 n,2 ) 4 c 2,1,nǧ11 (t)K ( u − m −1 1 (t) b n,1 ) −ǧ 21 (t)K ( u − m −1 2 (t) b n,2 )) ×c 2,1,nǧ12 (t)K ( v − m −1 1 (t) b n,1 )w(t)dt (1987) and the independence of centered Z 2 1 (t)w(t)dt, Z 2 2 (t)w(t)dt and Z 1 (t)Z 2 (t)w(t)dt. Therefore, (E.99) and (E.100) hold and the proof of theorem follows.

Theorem E.2 Assume the conditions stated in (A1)-(A6), (A7'), (A8) and (B1), (B2), (B3), and suppose that as n → ∞, n 1 /n 2 → n 1,2 ∈ (0, ∞), b n 1 /b n 2 → b 1,2 ∈ (0, ∞), h 2 /h 1 → h 1,2 ∈ (0, ∞), n 1,2 = 1/n 2,1 , b 1,2 = 1/b 2,1 , h 1,2 = 1/h 2,1 , η −1 = O(log(n 1 + n 2 )), η = o(1). For u = 1, 2 and v = 1, 2, leť g • uv (t) = M c uv (m −1 u (t))((m −1 u ) (t)) R H (y)ydy. (E.130)

• 1 (t) = (n 2,1 b 3 2,1 ) 1/2ǧ• 2 11 (t) + (n 1,2 b 1,2 ) 1/2ǧ• 2 21 (t) 2 K 2 (x)dx −2ǧ • 2 11 (t)ǧ • 2 21 (t)n 1/2 2,1 b 1/2 1,2 K (x)K (b 1,2 x)dx1(m 21 (0) = 0), (E.131) K • 2 (t) = (n 2,1 b 2,1 h 4 2,1 ) 1/2ǧ• 2 11 (t) + (n 1,2 b 1,2 h 4 1,2 ) 1/2ǧ• 2 21 (t) 2 K 2 (x)dx −2ǧ • 2 11 (t)ǧ • 2 21 (t)n 1/2 2,1 b 1/2 1,2 K (x)K(b 1,2 x)dx1(m 21 (0) = 0), (E.132) and consequently,

Now, if (m −1 1 ) (t) − (m −1 2 ) (t) = ρ n 1 ,n 2 κ(t) for some non-zero bounded function κ(t) and ρ n 1 ,n 2 = o(η ∧ b n,1 ), then as min(n 1 , n 2 ) → ∞, we have P sup

where Iâ ,b = (â + η,b − η) and κ 0 = (b n,1 b n,2 ) 1/2 h 1 h 2 m 1 (1−m −1 2 (m 1 (0))) m 1 (0) K 2 (t) K 1 (t) Proof. Without loss of generality, consider n 1 ≥ n 2 and that n 1 /n 2 = k for some integer k. Following the proof of Theorem 3.2, it suffices to evaluate E(Z 2 (t)), E(Z 2 (t)) and E(Z(t)Z (t)), where now Z(t) is defined in (E.97). Once the above quantities are obtained, the theorem will follow from an application of Proposition 1 of Sun and Loader (1994) and the same argument as given in the proof of Theorem 3.2.

Recall the decomposition of Z(t) in (E.105).

EZ 2 (t) = EZ 2 1 (t) + EZ 2 2 (t), (E.137) and recall the definition of W uv (j, t), u, v ∈ {1, 2} 2 in the proof of Theorem E.62.

EZ 2 1 (t) = n 1 j=1 [W 2 11 (j, t) − W 21 (j/k, t)1(j/k ∈ Z)] 2 = n 1 j=1 W 11 (j, t) + n 1 j=1 W 2 21 (j/k, t)1(j/k ∈ Z) − 2 n 1 j=1 W 11 (j, t)W 21 (j/k, t)1(j/k ∈ Z).

(E.138)

By the similar but simpler argument than the proof of Theorem E.1, we have W 11 (j, t)W 21 (j/k, t)1(j/k ∈ Z) =ǧ 11 (t)ǧ 21 (t) n 1 b n,1 b 2 n,2

×K ( m −1 1 (t) − m −1 2 (t) + xb n,1 b n,2 )dx(1 + o (1)).

Therefore, E(Z 2 1 (t)) = 1 n 1 b 3 n,1ǧ 2 11 (t) K 2 (x)dx + 1 n 2 b 3 n,2ǧ 2 21 (t) K 2 (x)dx −2ǧ 11 (t)ǧ 21 (t) n 1 b n,1 b 2 n,2 K (x)K ( m −1 1 (t) − m −1 2 (t) + xb n,1 b n,2 )dx (1 + o(1)), (E.142) E(Z 2 2 (t)) = (E(Z 2 1 (t))) 1−2 .

(E.143)

To calculate E(Z (t)) = E(Z 2 1 (t)) + E(Z 2 2 (t)), similarly to the proof of Theorem 3.1, ∂ ∂t W uv (j, t) can be written asW uv (j, t)(1 + o(1)) for u, v = 1, 2, wherẽ 

In Section 5.2 in the main manuscript, we briefly discuss the results of simulation studies for dependent data sets. The detailed results are reported here. The results in Tables 10,   11 , 12 and 13 indicate that the estimated sizes of the SIT test and the SCB test (described in Section 5 in the main article) are not deviated more than 1% from the estimated sizes when the data sets are independent, and the estimated powers of the SIT test and the SCB test are not deviated more than 6% from the estimated powers when the data sets are independent. model n = 100 n = 500 Example 1 (α = 5% , τ = 0.5, b n,s = n −1/4 s model n = 100 n = 500 Example 1 (α = 5% , τ = 0.5, b n,s = n −1/4 s model n = 100 n = 500 Example 3 (α = 5% , τ = 0.5, b n,s = n −1/4 s model n = 100 n = 500 Example 3 (α = 5% , τ = 0.5, b n,s = n −1/4 s

Nonparametric estimates of regression quantiles and their local bahadur representation

Curve registration by nonparametric goodness-of-fit testing

Smoothing noisy data with spline functions

Fitting time series models to nonstationary processes

Towards a general theory for nonlinear locally stationary processes

A central limit theorem for generalized quadratic forms

Identifying shifts between two regression curves

A simple nonparametric estimator of a strictly monotone regression function

Non-crossing non-parametric estimates of quantile curves

Comparing conditional quantile curves

Detecting relevant changes in the mean of nonstationary processes-a mass excess approach

Prediction in locally stationary time series

Regression percentiles using asymmetric squared error loss

Semi-parametric estimation of shifts

Regression rank scores and regression quantiles

A lack-of-fit test for quantile regression

An adaptive, rate-optimal test of linearity for median regression models

Quantile regression with varying coefficients. The Annals of Statistics

Regression quantiles

Robust tests for heteroscedasticity based on regression quantiles

Quantile smoothing splines

Nonparametric comparison of several regression functions: exact and asymptotic theory

Testing for structural change in regression quantiles

Nonparametric estimation and inference on conditional quantile processes

Sharing a quota on cumulative carbon emissions

Trimmed least square estimation in the linear models

Improvement of kernel type density estimators

Simultaneous confidence bands for linear regression and smoothing

Nonparametric quantile estimation

Efficient estimation for a subclass of shape invariant models

Nonparametric inference for time-varying coefficient quantile regression

Gradient-based structural change detection for nonstationary time series m-estimation

Simultaneous quantile inference for non-stationary longmemory time series

Inference of trends in time series

Local linear quantile regression

Confidence bands in nonparametric time series regression

A consistent nonparametric test of parametric regression models under conditional quantile restrictions

Nonparametric inference of quantile curves for nonstationary time series

Local linear quantile estimation for nonstationary time series

Simultaneous inference of linear models with time varying coefficients

Cov(A, C) = Cov( n 1 j 1 =1 n 2 j 2 =1 W 11 (j 1 , t)W 11 (j 2 , t)w(t)dtV j 1 ,1 V j 2 ,1 , n 1 j 3 =1 n 2 j 4 =1 W 11 (j 3 , t)W 21 (j 4 , t)w(t)dtV j 3 V n 1 j 4 /n 2 ,1 )andTherefore,It is easy to see that E Z 2 2 (t)w(t)dt and V ar Z 2 2 (t)w(t)dt has the same form as that of E Z 2 1 (t)w(t)dt and V ar Z 2 1 (t)w(t)dt with indices 1 and 2 exchanged. On the other Hereǧ • uv (t) is defined in a straightforward way. Using this fact, it follows that)dx , (E.148) E(Z 2 2 (t)) = (E(Z 2 1 (t))) 1−2 (E.149)Following the proof of Theorem 3.2, we have E(Z(t)Z (t)) = O(