key: cord-0183055-pgqk1vey authors: Lin, Yicong; Reuvers, Hanno title: Cointegrating Polynomial Regressions with Power Law Trends: Environmental Kuznets Curve or Omitted Time Effects? date: 2020-09-04 journal: nan DOI: nan sha: f8dbbe570991a37a23678ab587cceb4cd9e8b602 doc_id: 183055 cord_uid: pgqk1vey The environmental Kuznets curve predicts an inverted U-shaped relationship between environmental pollution and economic growth. Current analyses frequently employ models which restrict nonlinearities in the data to be explained by the economic growth variable only. We propose a Generalized Cointegrating Polynomial Regression (GCPR) to allow for an alternative source of nonlinearity. More specifically, the GCPR is a seemingly unrelated regression with (1) integer powers of deterministic and stochastic trends for the individual units, and (2) a common flexible global trend. We estimate this GCPR by nonlinear least squares and derive its asymptotic distribution. Endogeneity of the regressors will introduce nuisance parameters into the limiting distribution but a simulation-based approach nevertheless enables us to conduct valid inference. A multivariate subsampling KPSS test is proposed to verify the correct specification of the cointegrating relation. Our simulation study shows good performance of the simulated inference approach and subsampling KPSS test. We illustrate the GCPR approach using data for Austria, Belgium, Finland, the Netherlands, Switzerland, and the UK. A single global trend accurately captures all nonlinearities leading to a linear cointegrating relation between GDP and CO2 for all countries. This suggests that the environmental improvement of the last years is due to economic factors different from GDP. On page 370 of their seminal paper, Grossman and Krueger (1995) conclude: "Contrary to the alarmist cries of some environmental groups, we find no evidence that economic growth does unavoidable harm to the natural habitat. Instead we find that while increases in GDP may be associated with worsening environmental conditions in very poor countries, air and water quality appear to benefit from economic growth once some critical level of income has been reached." The quote above suggests an inverted U-shaped relationship between environmental degradation and economic growth. This relationship is currently known as the Environmental Kuznets Curve (EKC) and it forms an active research area. Its relevance becomes clear if we look at some forecasts of long-run economic growth. The projected GDP per capita growth of the world is about 2.1% per year for the next decades (chapter 3 in Nordhaus (2013); Gillingham and Nordhaus (2018) ) and this growth is partially powered by carbon-based energy resources, water usage, and material consumption. In absence of an EKC, economic growth will place more and more stress on the environment. Alternatively, if the EKC exists, then the inverted U-shape eventually implies a turning point after which economic growth and environmental improvement go hand in hand. Due to such considerations, there is now, some 25 years after its first conception, a rich literature that (1) reports on the experimental evidence on the existence/nonexistence of the EKC, (2) provides economic theory to explain the EKC, and/or (3) refines the econometric tools that are used to analyse the EKC. 1 To quantify the volume of the literature we have entered the search query "Environmental Kuznets Curve" into the Web of Science: more than 4,200 references are found. 2 Driven by contradictory empirical results as well as the variability in estimated turning points, the EKC has been criticised on two main points. First, the income variable was initially treated as a 1 Further references to these specific areas of research can be found in the review articles by Dasgupta et al. (2002) , Stern (2004) , and Carson (2009) stationary variable whereas later research shows that the unit root hypothesis often cannot be rejected (see Galeotti et al. (2009), p. 553; Stern (2017) , p. 14-15). Nonstationarity has further implications because EKC regressions include higher integer powers of GDP as well. This combination of nonstationarity and nonlinearity places the EKC in the nonlinear cointegration literature and appropriate econometric techniques should be employed. Such techniques have been developed in Wagner (2015) (2016)). Whereas such modelling approaches do allow for a more flexible relationship, they also implicitly assume that nonlinear environmental effects are solely attributable to economic growth. Relevant variables are thus potentially missing from the model specification. Such omitted variables are a valid concern because advances in green technology, pollution policy, and environmental awareness, may all influence pollution levels. However, such data is typically available for short time spans only (and for that reason often excluded from the model). Time effects can control for time-variation in unobserved effects (Vollebergh et al. (2009) ). Current developments on nonlinear cointegration emphasize the role of the nonstationary regressor yet pay less attention to time effects. Time effects are important. The small simulation exercise in Table 3 Stypka et al. (2017) reiterate the need to model the income variable as nonstationary. It is less important to use an estimation procedure acknowledging the fact that several integer powers of the same integrated process appear as regressors. That is, Stypka et al. (2017) find that the "standard estimator" which treats higher order powers of the integrated regressor as additional I(1) variables has the same limiting distribution as the CPR estimator. The rejection rate (in %) when testing H 0 : φ 2,1 = φ 2,2 = φ 2,3 = 0. (A) Falsely inflated rejections of H 0 : φ 2,1 = φ 2,2 = φ 2,3 = 0 when time effects are omitted. (B) Adding an additional global deterministic trend to the model specification hardly influences the power of the test H 0 : φ 2,1 = φ 2,2 = φ 2,3 = 0. That is, significant coefficients in front of x 2 i,t remain significant after adding a redundant flexible global trend. DGP y i,t = τ g t θ + τ 1,i + τ 2,i t + φ 1,i x i,t + u i,t y i,t = τ 1,i + τ 2,i t + φ 1,i x i,t + φ 2 x 2 i,t + u i,t Model y i,t = τ 1,i + τ 2,i t + φ 1,i x i,t + φ 2,i x 2 i,t + u i,t Correct Specification 1 illustrates the point. Foreshadowing our proposed model, we consider a multivariate setting with a global nonlinear, smooth time trend. The global trend is omitted by the researcher and a quadratic EKC-specification is estimated: y i,t = τ 1,i + τ 2,i t + φ 1,i x i,t + φ 2,i x 2 i,t + u i,t (i = 1, . . . , 3), where x i,t and y i,t are unit-specific variables measuring income and environmental pollution, respectively. We test H 0 : φ 2,1 = φ 2,2 = φ 2,3 = 0 because a significantly negative coefficient in front of x 2 i,t is typically interpreted as evidence of an EKC. 4 Panel (A) reveals exacerbated rejection frequency as curvature caused by the global deterministic trend is mistakenly interpreted as curvature caused by the income variable. In other words, negative and significant coefficients in front of squared GDP are possibly caused by omitted nonlinear deterministic trends rather than being indicative of an EKC. To be on the safe side, we recommend researchers to include a nonlinear trend component in their model specification. If unnecessary, then this is rather innocuous. Indeed, Panel (B) of Table 1 shows that significant results for nonlinear economic growth effects continue to be found with modest losses in statistical power. The contributions of this paper are fourfold. First, we propose the Generalized Cointegrating Polynomial Regression (GCPR). This multivariate model features a global power law trend to capture time effects. Power law trends have been employed to model non-constant growth rates in technology indices (Duggal et al. (1999); Duggal et al. (2007) ) and production functions (Klein et al. (2004) ). Within the EKC context, this flexible trend can capture common time effects that are implicit in omitted variables. Alternatively, as in Li and Linton (2020) , the reader can view the global flexible trend as an outside option (next to the income variable) to describe nonlinearities in the data. Limiting distributions for estimators in models with purely deterministic power law trends have been reported in Phillips (2007) , Robinson (2012) and Gao et al. (2020) . The presence of integrated variables requires an alternative asymptotic framework. Moreover, due to endogeneity, approaches assuming pre-determined integrated regressors (Park and Phillips (1999) ; Park and Phillips (2001); Chang et al. (2001) ) are inappropriate and we instead opt for a proof along the lines of Chan and Wang (2015) . The resulting limiting distribution is non-standard because (1) the scaling matrix with convergence rates is nondiagonal and parameter dependent, and (2) This paper is organized as follows. Section 2 introduces the model and the estimation framework. Asymptotic properties of the estimators and parameter inference are discussed in Section 3. The Monte Carlo simulations in Section 4 compare asymptotic results to finite sample performance. An in depth discussion of the Environment Kuznets Curve can be found in Section 5. Section 6 concludes. The proofs of the main theorems are collected in the Appendix and further information is available in the Supplementary Material. Finally, some words on notation. The integer part of the number a ∈ R + is denoted by [a] . For a vector x ∈ R n , its p-norm is denoted by the induced p-norm is defined as A p = sup x 0 Ax p / x p . We will omit the subscripts whenever p = 2. The (n × n) identity matrix is denoted I n and ı n signifies an n-dimensional column vector with all entries equal to 1. The block-diagonal matrix diag[A 1 , . . . , A n ] stacks the matrices A 1 , . . . , A n along its diagonal. We omit the integration bounds whenever the integration interval is [0, 1]. The symbol " d =" stands for equality in distribution, and "−→ p " and "−→ d " denote convergence in probability and in distribution. If convergence occurs conditionally on the sample, then we add a superscript "*" to the standard notation. The probabilistic Landau symbols are O p (·) and o p (·). Finally, the generic constant C can change from line to line. Our model specification enriches the Seemingly Unrelated Cointegrating Polynomial Regressions (SUCPRs) from Wagner et al. (2020) with a flexible deterministic trend. That is, each individual series in the system is affected by specific deterministic variables and integrated regressors (and their integer powers) while a global flexible trend describes nonlinear behaviour that is prevalent across all series. The resulting Generalized Cointegrating Polynomial Regression (GCPR) is given by: we write y i,t = τ g t θ + z i,t β i + u i,t , where z i,t = 1, t, x i,t , . . . , x p i i,t and β i = τ 1,i , τ 2,i , φ 1,i , . . . , φ p i ,i . Finally, we stack all N equations in (2.1) in matrix form to retrieve with y t = y 1,t , . . . , y N,t , Z t = diag z 1,t , . . . , z N,t , and the vector β = β 1 , . . . , β N of length p = 2N + N i=1 p i containing all local parameters. We consider nonlinear least squares (NLS) estimators of the unknown parameters in (2.1). As such, The optimization problem in (2.3) is easy to solve. For any given θ, model (2.2) is linear-in-parameters and the minimizers for τ g and β can be found from an OLS regression by constructing Z t (θ) = t θ ı N Z t and computing We subsequently minimize the concentrated criterion function Q T (θ) = Q T θ, τ g (θ), β(θ) to obtain θ T . At last, we plug in θ T and recover τ g,T and β T through a final OLS estimation. Keeping the powers of x i,t fixed allows us to test for their significance and thereby distinguish between nonlinearities caused by deterministic and stochastic trends. This is important for our empirical application on the Environmental Kuznets Curve, see Section 5. Hu et al. (2021) study a model with a flexible power of the integrated regressor. That is, these authors derive the limiting distribution of the NLS estimators for β and γ when y t = β|x t | γ + u t with β 0. The GCPR of (2.1) can be extended in several directions. First, integer powers of deterministic trends can be added as long as Θ(ε) is adjusted accordingly (to avoid collinearity). Second, multiple explanatory variables can be included. Related to the EKC, the literature suggests examples such as: population density (Selden and Song (1994)), trade openness (Jalil and Feridun (2011)), energy prices (Al-Mulali and Ozturk (2016)), and educational level (Maranzano et al. (2021) ). For nonstationary variables, conditions similar to those on {x i,t } should be fulfilled (Assumption 2 below). Stationary variables should be strictly exogenous. Avoiding the elaborate notation which would otherwise arise, we focus on the baseline specification in (2.1). We subsequently study the asymptotic properties of the NLS estimators. To this end we first collect all the unknown parameters in the vector γ = θ, τ g , β . This vector is assumed to be an element of the parameter space Γ = Θ(ε) × R 1+p . The true parameter vector is γ 0 = θ 0 , τ g,0 , β 0 . The global trend is relevant, i.e. τ g,0 0. The first assumption is needed to avoid identification issues. That is, if τ g,0 = 0, then θ is not identified and the Davies problem arrises when testing H 0 : τ i = 0 (see Davies (1977 Davies ( , 1987 ). Such complications are not investigated here and this is further reflected in our model specification (2.1). That is, we consider flexible powers of the deterministic trends but fixed powers of the stochastic trends, hence allowing us to test zero restrictions on (elements of) β. This is of crucial importance in the EKC application while determining whether nonlinear effects in the economic growth variables (x i,t ) remain significant after nonlinear time trends have been added to the model. Assumption 1 has been relaxed in the literature albeit for different models. Baek et al. (2015) and Cho and Phillips (2018) study the asymptotic behaviour of a quasi-likelihood ratio test when Assumption 1 is violated and the conditional mean of the data contains strictly stationary regressors and a flexible time trend. Alternatively, one can use drifting parameter sequences with different identification strengths as in Andrews and Cheng (2012) . Assumption 2 excludes cointegration among elements of x t and defines this vector as the partial sum of a short memory process. The latter implies that 1 Subscripts refer to specific elements. For example, B v i and ∆ v i u j denote the i th and (i, j) th elements of B v and ∆ vu , respectively. A concise exposition of our results asks for additional notation. An enumeration of various definitions is presented below. (1) Introduce D (i),T = diag 1, T, T 1/2 , T, . . . , T p i /2 to scale the deterministic and stochastic trends within each equation. For the full system of equation, define D Z,T = diag D (1),T , . . . , D (N),T , (3) For the second-order bias terms, we define Theorem 1 Under Assumptions 1-2, we have as T → ∞ and N fixed. The proof of Theorem 1 is closely related to the work by Chan and Wang (2015) . These authors provide the asymptotic distribution of NLS estimators under a set of general conditions in univariate, nonstationary time series models (see their theorem 3.1). The results in Chan and Wang (2015) and Wang et al. (2018) suggest that Assumption 2 can be replaced by a long memory specification for ∆x t . However, long memory parameters will enter the limiting distribution and inference will be complicated further. We illustrate Theorem 1 with two examples. These examples highlight the two mathematical features that complicate parameter inference. We consider y t = τt θ + u t with innovations satisfying Assumption 2. The limiting distribution of the parameter estimators depends solely on the mean square Riemann-Stieltjes integrals τ 0 r θ 0 ln(r)dB u and r θ 0 dB u , and is therefore normally distributed (e.g., section 2.3 in Tanaka (2017)). We have with the result in theorem 6.3 of Phillips (2007). If y t = τt θ + φx t + u t , then the limiting distribution of the NLS estimator is: This limiting distribution exhibits second order bias when ∆ vu 0, or when B u and B v are correlated. Two features of the limiting distribution of G γ 0 ,T γ T − γ 0 deserve further comments. First, as emphasised in Examples 1-2, the scaling matrix G γ 0 ,T features two less common properties: (1) this matrix depends on the true parameters τ g,0 and θ 0 , and (2) G γ 0 ,T is not diagonal. These peculiarities are caused by the nonlinearity and nonstationarity of the model. More specifically, these features can be traced back to the presence of the global trend. Limiting distributions with a similar mathematical structure can be found in the structural breaks literature, cf. model setting II.b of Perron and Zhu (2005) and its detailed analysis in Beutner et al. (2020) . Second, the nonstationary regressor x i,t enters the model (2.1) through a polynomial transformation of the form g(x i,t , φ i ) = φ i,1 x i,t + . . . + φ i,p i x p i i,t (i = 1, 2, . . . , N). In the terminology of Park and Phillips Chan and Wang (2015) , that this leads to second-order bias terms and hence nonstandard inference (except for the special case of strictly exogenous nonstationary regressors). Correcting for second-order bias terms typically involves estimating long-run variance (LRV) matrices. This subsection establishes that the NLS residuals can be used to construct consistent kernel estimators for the LRV matrices ∆ and Ω. Defining V t (γ) = u t (γ) , ∆x t ] with u t (γ) = y t − τ g t θ ι N − Z t β, these LRV estimators are defined as for some kernel function k(·) and bandwidth parameter b T . The first N elements of V t ( γ T ) are the elements of the residual vector u t = y t − τ g,T t θ T ı N − Z t β T . The remaining elements are ∆x t = v t . Assumption 3 (a) k(0) = 1, k(·) is continuous at zero, and sup x≥0 |k(x)| < ∞. The conditions on the kernel function k(·), Assumptions 3(a)-(b), are identical to those in Jansson (2002) . Jansson (2002) remarks that these assumptions "would appear to be satisfied by any kernel in actual use". Commonly used kernel functions such as the Bartlett, Parzen, and Quadratic Spectral kernels indeed satisfy all these assumptions. Assumption 3(c) differs from the usual requirement, lim T →∞ b −1 T + T −1/2 b T = 0, by a factor ln T . The difference is caused by the estimation error in θ T . This error causes the residuals { u t } to be less close to the innovations {u t } and we balance this by including autocovariance matrices of higher lags at a slower pace. Under Assumptions 1-3, we have ∆ T −→ p ∆ and Ω T −→ p Ω. The limiting distribution in Theorem 1 is nonpivotal and thus not directly suited for inference. Some popular solutions for linear-in-parameters cointegration models are: Saikkonen's (1992) dynamic least squares, the integrated modified OLS and fixed-b approaches by Vogelsang and Wagner (2014) , and the fully modified approach advocated in Phillips and Hansen (1990) and Phillips (1995) . For a nonlinearin-parameter model as in (2.1), a preliminary Monte Carlo exercise 5 shows poor performance for fully modified inference but promising results for a simulation based approach. We pursue the latter method for the remainder of this paper. The main idea behind the simulation based approach is to replace nuisance parameters by consistent estimates and to rely on Monte Carlo (MC) simulations to approximate the limiting distribution. The empirical quantiles of these MC draws allow us to conduct inference. Clearly, this kind of approach will provide exact inference when the limiting distribution is invariant with respect to the nuisance parameters (e.g. Dufour and Khalaf (2002) and Dufour (2006) ). In the absence of such invariance, Wang et al. (2018) and Bergamelli et al. (2019) show that the simulation approach remains asymptotically justified in several model specification. We adapt the algorithm from Wang et al. (2018) to the current setting and prove its asymptotic validity. Algorithm 1 (Simulation-Based Inference) Step 1: Estimate γ T and use the residuals { u t } to compute the estimators ∆ T and Ω T from (3.2). Step 2: Repeat for j = 1, . . . , J, (a) Draw random variables {e t } M T t=1 i.i.d. from N(0, I 2N ). (b) Compute µ t v t = Ω 1/2 T e t and the partial sum χ t = χ 1,t , . . . , χ N,t = t s=1 υ s . (c) Let J t; γ T = τ g,T t θ T ln t ı N , t θ T ı N , Z t , where Z t = diag z 1,t , . . . , z N,t with z i,t = 1, t, χ i,t , . . . , χ p i i,t (for i = 1, . . . , N). For a given M T , construct the j th simulated draw as Step 3: Use the empirical quantiles of elements of J (1) , . . . , J ) to conduct inference. Algorithm 1 uses a discretisation in M T steps to approximate the limiting distribution of the parameters. In practice, and in accordance with Theorem 3, we can take M T = T . Remark 3 details how simulation-based inference can be used to test hypotheses concerning the model's parameters. Discussions on size and power are also presented there. Suppose Assumptions 1-3 hold, let {M T } ⊆ (0, ∞) with lim T →∞ M T T ≤ κ, for some κ < ∞, then we have in probability. Theorem 3 establishes the asymptotic validity of the simulation approach. That is, for a large enough J, the empirical quantiles of the simulated distribution will coincide with the asymptotic distribution. Two remarks are important. First, even though the simulation algorithm is adapted from Wang et al. (2018), the proof of Theorem 3 is not. In particular, the method of proof is similar to Theorem 1 and continues to allow for endogeneity of the regressors. Second, the simulation approach mimics the stochastic integrals in the limiting distribution directly. It therefore suffices to draw normally distributed random variables in Step 2(a) and use consistent long-run covariance estimates to replicate the covariance structure of the underlying Brownian motions. Compared to a bootstrap procedure, this simulation approach has the advantage of avoiding tedious NLS re-estimation on bootstrap samples but it comes at the cost of forsaking possible asymptotic refinements. Step 3 in Algorithm 1 has been kept general for notational convenience. An illustrative example is as follows. Assume we are interested in H 0 : φ 2,1 = 0 (irrelevance of the regressor x 2 1,t ) when Under H 0 , we have T 3/2 φ 2,1 = e 6 G γ 0 ,T γ T − γ 0 −→ d e 6 J (γ 0 ) with e k being the k th basis vector in R 2+p . Denoting the empirical ζ-quantiles of e 6 J (1) , . . . , e 6 J (J) by c ζ , a test of size α will reject for T 3/2 φ 2,1 < c α/2 or T 3/2 φ 2,1 > c 1−α/2 . Under the alternative φ 2,1 0, we rewrite the test statistic as T 3/2 φ 2,1 = T 3/2 φ 2,1 − φ 2,1 + T 3/2 φ 2,1 . Statistical power is guaranteed because the simulation approach mimics the asymptotic distribution and is thus bounded, whereas the second term diverges. The correct specification of the nonlinear cointegrating relation will result in a stationary error process {u t } t∈Z . We consider a KPSS-type test statistic for the null of stationarity. The candidate statistic is shown that subsampling can resolve this issue. We will follow their approach and use subsamples of size q T to compute the test statistics. Theorem 4 where W (·) denotes an N-dimensional standard Brownian motion. Theorem 4 does not provide any guidance on the choices for the starting value and the subsample size q T . First, for a given q T , Choi and Saikkonen (2010) argue that the use of a single subsample (instead of all T observations) implies a significant loss of power. We follow their example and combine all M = [T/q T ] subresidual series of length q T using a Bonferroni procedure. That is, we create subresiduals series by selecting adjacent blocks of q T residuals while alternating between the start and end of the sample. We calculate the KPSS-type test statistic for each subseries, say K 1 , . . . , K M , and reject the null of stationarity at significance α whenever max{K 1 , . . . , K M } exceeds c α/M which is defined by P W (r) 2 dr ≥ c α/M = α/M . Finally, we select the block size q T using Romano and Wolf's (2001) minimum volatility rule. The approach is now completely data-driven. 6 Proposition 5 in Wagner and Hong (2016) shows that the limiting distribution of K + T is free of nuisance parameters if θ 0 is known and only a single integrated regressor occurs with integer powers greater than one. This result does not carry over to the current setting because of the estimation error in θ T . Section 3 provide useful guidance in finite samples. Further details on the implementation are as follows. The long-run covariance matrices in (3.2) are computed using the Barlett kernel, k(x) = 1 − |x| for |x| ≤ 1 (and zero otherwise), and the bandwidth selection method described in Andrews (1991) . Simulated limiting distributions are based on J = 299 replicates and we set M T = T . We test at 5% significance and report results based on 3, 000 Monte Carlo replications. Two data generating processes are studied: DGP1 and DGP2. This DGP is inspired by the simulation study in Wagner et al. (2020) . It augments their quadratic seemingly unrelated cointegrating polynomial regression model with a global flexible trend. That is, we consider and compute u i,t and ∆x i,t = v i,t recursively as Regarding the global trend in (4.1), we set τ g = −0.2 and consider θ ∈ {0.8, 1.3, 1.8}. All other coefficient values are inspired by Wagner et al. (2020) . That is, τ 1,i = 1, τ 2,i = 1 and φ 1,i = 5 are identical across equations. 7 Also, we let ρ 1 = ρ 2 = ρ 3 = ρ 4 and redefine these four parameters as ρ. We vary ρ ∈ {0, 0.3, 0.6, 0.8}, N ∈ {3, 5, 10}, and T ∈ {150, 300, 600}. In line with the typical EKC application, we test for the significance of x 2 i,t . We set φ 2,i = 0 for i = 1, . . . , N and report the empirical size of the single equation test for H 0 : φ 2,1 = 0 and the joint test for H 0 : φ 2,1 = . . . = φ 2,N = 0. For θ 0 = 1.3, the empirical size of various tests are displayed in Table 2 The empirical size (in %) of the single-equation tests H 0 : φ 2,1 = 0 and the joint test for H 0 : φ 2,1 = . . . = φ 2,N = 0 with φ 2,i denoting the coefficient in front of x 2 i,t . The Monte Carlo results are based on: simulated inference with θ estimated by NLS (SimNLS), simulated inference with known θ = 1.3 (SimNLS(θ 0 )), and two Fully Modified estimators for systems developed by Wagner et al. (2020) with known θ = 1.3 (FM-SOLS(θ 0 ) and FM-SUR(θ 0 )). whereas FM-SOLS and FM-SUR are oversized. We subsequently simulate power curves. 9 The specification of the single equation test and joint test are as before but we now vary φ 2,1 = . . . = φ 2,N over the set [−0.008, −0.007, . . . , 0]. We take ρ = 0.3, θ = 1.3 and N = 3 as the baseline scenario and subsequently vary these quantities one-by-one. Figures 1-2 show the results. As expected, power increases with increasing sample size, and as φ 2,i moves away from zero. Our second set of simulations is tailored towards the empirical application. That is, we employ parametrizations that mimic the distributional properties of data. Generally speaking, we first estimate φ 2,i = 0 is estimated on the data. We subsequently move φ 2,1 = . . . = φ 2,6 away from zero in the DGP and check whether we can detect the resulting curvature caused by the integrated variable. 10 All details on the simulation designs for DGP2(a)-(c) are available in Section S7 of the Supplementary Material. As in (a), we vary φ 2,1 = . . . = φ 2,6 and test for the significance of these parameters. The solid lines in Figures 3(c) and 3(d) are power curves obtained using the correctly specified DGP whereas markers indicate the power when a redundant global trend is estimated as well. The redundant trend has virtually no influence on the statistical power of the coefficient tests of the first five series. There is a power loss for i = 6. An inspection of the coefficients offers an explanation. The estimated coefficients in front of the global trend are mostly small (10 −10 to 10 −9 ) and thus irrelevant. However, in a fraction of cases the flexible trend mimics the curvature in the 6 th series causing the quadratic stochastic trend to become insignificant. As reported in the introduction, the power of the joint test does not suffer from the inclusion of a redundant trend. (c) KPSS test: Nonstationary residuals are an indication of model misspecification. That is, either the regression is spurious or the functional form of the cointegrating relation is misspecified. We look at the latter situation. The simulation DGP is the quadratic GCPR as in DGP2 (a) We examine the evidence for an EKC for a collection of 18 countries over the period 1870-2014 (T = 145). Economic growth is measured by GDP and we use carbon dioxide (CO 2 ) emissions as a proxy for air pollution. The origin of these data is as follows. We use population and GDP data from the Maddison Project (see https://www.rug.nl/ggdc/historicaldevelopment/maddison/). Our carbon dioxide observations are fossil-fuel CO 2 emissions as made available by the Carbon Dioxide Information Analysis Center (CDIAC, see https://cdiac.ess-dive.lbl.gov). The CDIAC database ceased operation in per capita and subsequently log-transformed. In accordance with the notation of this paper, we will denote them by x i,t and y i,t , respectively. The same data (or subsets thereof) have also been studied by Wagner (2015), Chan and Wang (2015) , Wang et al. (2018), Wagner et al. (2020), and Lin and Reuvers (2020). 11 This conveniently allows us to compare results. All user choices (kernel specification, bandwidth selection, etc.) are kept the same as during the simulation study (see page 17). Prior to the analysis of a multivariate specification, we will first discuss several features of the individual time series (hence omitting subscripts "i"). The example throughout this narrative is Belgium ( Figure 4 ). 12 An inverted U-shaped relationship between GDP and CO 2 (both in log per capita) is clearly visible in Figure 4 (a) and behavior like this has triggered research on the Environmental Kuznets Curve. However, the time heat map also shows that time is almost monotonically increasing along the curve. Time effects -e.g. increasing global environmental awareness, worldwide advances in sustainable technologies -can be valid alternative explanations for these nonlinearities and their omission can (falsely) exaggerate the influence of GDP. It is for this reason that we develop and analyse the Generalized Cointegrating Polynomial Regression (GCPR). More evidence for the importance of time effects is available in Figure 4 (b). This figure depicts the same per capita series after detrending. 13 The inverted U-shape is now (visually) less pronounced or 11 The stationarity properties of the series have been extensively studied and discussed in these papers. We will not repeat this analysis but refer the interested reader to Section S8 of the Supplement. The exact numbers may show (minor) differences from previously reported results due to differences in: (1) the time span of the data, (2) the implemented long-run covariance estimator, and (3) the scaling of the data. Related to scaling, we follow the official guidelines and multiply by 3.667 and 10 3 to convert thousand of metric tons of carbon into units of carbon dioxide. Since the data will be expressed in logarithms, this rescaling effectively amounts to a change of intercept. 12 The data for Austria, Belgium, and Finland are mentioned in both Wagner (2015) and Wagner et al. (2020) to behave in line with the EKC. We discuss Belgium in the main text but the interested reader can find the same figures for Austria and Finland in Section S8.3. Qualitatively, the findings for these other two countries are the same. 13 The Perron and Yabu (2009) test allows us to test for the presence of a deterministic trend irrespectively of the series being trend-stationary or having an unit root. The results of this test (see supplement) indicate that log per capita GDP is likely to have a deterministic trend component. It is thus recommended to have a deterministic trend in the model for log per capita CO 2 emissions and the visual inspection of the relationship between GDP and CO 2 emissions (in log per capita) should take place after partialling out this deterministic trend. even absent. Finally, let us depart from a traditional linear cointegration specification: This model cannot incorporate any nonlinear behaviour over time and is therefore ill-suited to fit the data displayed in Figure 4 of the NLS estimator for this specification is shown in Figure 4 (d). The absence of a minimum at θ = 2 casts doubt on the commonly used quadratic specification in x t . Additionally, the lack of any minimum might be interpreted as a sign that log per capita GDP is not the source of nonlinearity. This finding is not specific for Belgium. There are no minima in the RSS for 15 out of 18 countries (see Section S8.4). For the remaining three countries -Denmark, France and the Netherlands -minima are found at θ DK = 1.46, θ FR = 3.61 and θ NL = 1.28, respectively. Alternatively, we can describe the nonlinearity in the data using a flexible deterministic trend as in The interpretation of a country-specific flexible deterministic trend is complicated because of its high collinearity with GDP per capita. The multivariate analysis of this section allows us to separate country-specific environmental improvements caused by national income growth from global environmental improvements. We study the following six countries (N = 6): Austria, Belgium, Finland, the Netherlands, Switzerland, and the UK. The motivation behind this choice is as follows. First, based on data series to ours, Piaggio and Padilla (2012), Mazzanti and Musolesi (2013) , and Wagner et al. (2020) report considerable evidence of parameter heterogeneity across countries. 14 The evidence in Mazzanti and Musolesi (2013) is anecdotal in the sense that these authors consider groups of similar countries and find different results for different groups. The lack of overlap among confidence intervals of country-specific parameters has also been interpreted as a sign of heterogeneity (section 4.2 in Piaggio and Padilla (2012)). Wagner et al. (2020) explicitly test for various forms of poolability and conclude that pooling is (at most) appropriate for small subgroups of countries. This lack of parameter homogeneity justifies a multivariate approach with small N rather than a panel setting. Admittedly, in the current time-series setting, studying "large N" is also infeasible since consistent estimators for (2N × 2N) long-run covariance matrices are required. Second, prior studies already refute the existence of a carbon dioxide EKC for several countries and little seems lost by excluding these countries from the outset. 15 That is, we consider the same countries as in Wagner et al. (2020), who decide on these countries because their prior cointegration analysis "leads to evidence for a quadratic cointegrating EKC including a constant and linear trend". Having decided on the set of countries, we subsequently study the effect of the global flexible trend on EKC evidence. Table 3 shows the estimation results of the quadratic EKC specification This setting (possibly with the additional constraint τ 2,i = 0) has been explored in numerous papers, for example: Selden and Song (1994) (2005). 15 Most of the parameters in the Generalized Cointegrating Polynomial Regression are country-specific. The estimation accuracy of these parameters should deteriorate little when focussing attention on a subset of countries. Losses will occur in the precision of the estimators for τ g and θ. There is thus a trade-off between accurate global trend estimation (improving with large N) and accurate LRV estimation (deteriorating with large N). To strike a balance and to connect to the recent literature, we continue the analysis of Wagner et al. (2020) and take N = 6. Table 3 : Parameter estimates and test results for Models (M1)-(M3). The joint p-value refers to the test with null hypothesis H 0 : φ 2,1 = . . . = φ 2,6 = 0 and is thus inapplicable for Model (M3). Global Trend whereas evidence against these null-hypotheses is less pronounced for the simulation-based approach. The same behaviour emerges when testing φ 2,1 = . . . = φ 2,6 = 0 jointly. This pattern reminds of the simulation results in Table 2 From a statistical perspective, the term τ g t θ opens a different channel through which nonlinearities can be described. We refer back to the introduction for a further elaboration on this point. From an economic perspective, τ g t θ captures changes in CO 2 emissions that are common across series and thus unrelated to changes in national GDPs. Parameter inference for Model (M2) is also reported in Table 3 . The contributions of x 2 i,t are insignificant for both individual countries and all countries jointly. How about the significance of the global trend? The standard Wald test for τ g = 0 is invalid because θ is unidentified under the null hypothesis (see Assumption 1 and the related discussion). As a heuristic alternative, we vary θ over the interval [0, 2.5] and compute Wald statistics while assuming θ to be fixed. Comparing these Wald statistics to the 95% quantile of a χ 2 (1)-distributed random variable (critical value: 3.842), the range of θ-values from about 0.5 to 1.75 implies a significant global trend ( Figure 5 ). Having estimated θ = 1.263, our analysis suggests that the global trend and not GDP per capita is the source of nonlinearity. Before interpreting this result, we first verify whether the model with φ 2,1 = . . . = φ 2,6 = 0 shows signs of misspecification. Omitting insignificant parameters from the previous model specification, we arrive at Model (M3) is linear in log per capita GDP. The positive parameter estimates for φ 1,i imply that at a given point in time increases in economic growth imply increases in CO 2 emissions. However, as τ g = −1.374 × 10 −5 and θ = 2.45, there will be common emission reductions over time. Also, the omission of the quadratic terms in log per capita GDP do not seem to result in a misspecified model. First, the KPSS test does not reject the null of cointegration. Second, there is no (visual) evidence that the linear functional form of (M3) is inappropriate. To arrive at this last conclusion, we compute y i,t = y i,t − τ g t θ − τ 1,i − τ 2,i t and employ the nonparametric kernel estimator from Wang and Phillips (2009) to estimate y i,t = f (x i,t ) + u i,t for each individual country. 16 Figure 6 shows the nonparametric estimate in blue and the fit of Model (M3) in red. After removal of the global trend, there are some temporary departures from linearity but there is little curvature overall and certainly no visual turning point. In Table 4 , we formally test the null of linearity using the model specification test as outlined in section 3 of Wang and Phillips (2016). Based on the full sample, linearity is rejected for Austria only. A comparison with the 95% confidence intervals of the kernel estimate ( Figure 6 ) suggests that this rejection is caused by the sharp decline in CO 2 emissions during World War II. We subsequently repeat the analysis using the T = 69 observations after 1945. Linearity is never rejected. 17 All this align well with our earlier findings of a relevant global trend and irrelevant quadratic effects in log GDP per capita. The preceding analysis suggests that the global flexible trend captures omitted determinants of CO 2 emission levels that have been decreasing over time. In their analysis, Grossman and Krueger (1995) already included a global deterministic trend in their model because they "did not want to attribute to national income growth any improvements in local environmental quality that might actually be due to global advances in the technology for environmental preservation or to an increased global awareness of the severity of environmental problems". Indeed, since reliable data on green technology adaptation 18 and global awareness is scarcely available (certainly for time horizons allowing for a cointegration analysis), these variables are likely missing and thus requiring a proxy. Similar remarks are applicable 16 The properties of nonparametric kernel estimators in nonlinear cointegration models have been studied by Wang and Phillips (2009), Gao et al. (2015) and Wang and Phillips (2016), among others. The latter reference is particularly relevant because it establishes that kernel estimators remain consistent and asymptotically (mixed) normal under serially correlated errors and endogeneity. None of these papers includes deterministic trends in the DGP. However, we conjecture that detrending does not affect the asymptotic properties of the kernel estimator due to the high convergence rates of the trend parameters in comparison to the slow convergence rates of the nonparametric estimator. Our bandwidth choice is h = T −1/3 . 17 The high p-values in Table 4 are caused by visually small deviations from the linear trend (see Section S8.6 in the Supplement for detailed graphs). Also, the relatively small sample size (for nonparametric settings) might adversely affect power. Matlab functions for nonparametric kernel regression and specification test are available at https://github.com/HannoReuvers. For remarks on bandwidth choice and detrending, we refer to footnote 16. 18 Nordhaus (2014) discusses the link between climate change and technological changes. As another example, Figure 2 in Gillingham and Stock (2018) reports a steady decline in the price of solar panels and a steady growth in solar panel sales. Cheaper solar energy can substitute fossil energy thereby reducing pollution. to variables such as pollution control policies 19 . In reduced-form models, an EKC finding is typically explained by national income being the proxy for these omitted variables. That is, at higher levels of national income, countries have access to cleaner technologies and its citizens show greater appreciation for the environment and pollution legislation. The current analysis contradicts these income effects and points towards improvements being captured by a global trend. Our final model specification, Model (M3), is linear in log GDP per capita. Moreover, for a given year, the coefficient estimates suggest that increasing national income by 1% implies an increase in carbon-dioxide emissions of about 1%-2.5% (depending on the country). This result seems plausible for non-carbon-neutral economies. However, CO 2 emissions in Austria, Belgium, Finland, the Netherlands, Switzerland, and the UK are jointly reducing at the end of the sample. What cause these global emission reductions? Mazzanti and Musolesi (2013) suggest that conglomerates of countries anticipate and respond to international climate agreements such as the Rio convention (1992) and Kyoto protocol (adopted in 1997; operational since 2005). Interestingly, the latter agreement contains emission reduction targets to be reached in 2020 and such "working-towards-a-common-reduction-deadline" does point towards a time effect. 20 Alternatively, given our sample of European countries, EU coordinated emission reduction efforts like the EU Emissions Trading System (ETS) can be a driving force behind these common emission decreases. (2013)) but a methodological approach accounting for nonstationary regressors is currently unavailable. We fill this gap. The unknown powers of the global trend are estimated jointly with the parameters in the cointegrating relation. The limiting distribution is nonstandard due to a non-diagonal scaling matrix and second order bias terms. We therefore suggest a simulation-based approach to conduct inference. The usual subsampling KPSS-type for stationarity of the innovations of the nonlinear cointegrating relation remains valid. Our results are supported by Monte Carlo simulation. The empirical application on the Environmental Kuznets Curve shows that a flexible trend can fully capture the nonlinearity in the data thereby making higher order powers of log per capita GDP redundant. Our resulting model is linear in log per capita GDP and suggests an alternative explanation in which time effects -e.g. technological progress, increasing environmental awareness, tightening pollution policy -rather than economic growth cause the recent slowdown in CO 2 emissions. Contrary to the opening quote in the introduction, our analysis suggests that CO 2 emissions increase with economic growth. Carbon dioxide emissions do decrease due to time effects. 20 According to the Doha amendment of the Kyoto protocol, the reduction commitments were 92% (over the period 2008-2012) and 80% (over the period 2013-2020) of 1990 emission levels for Austria, Belgium, Finland, the Netherlands and the UK. For Switzerland, the reduction target was also 92% (over the period 2008-2012) but 84.2% (over the period 2013-2020). (source: https://unfccc.int/files/kyoto protocol/application/pdf/kp doha amendment english.pdf). Earlier versions of this paper have been presented at the 2019 CFE meeting in London, the Econometrics Internal Seminar (EIS) at Erasmus University Rotterdam, and the Brownbag Seminar at Vrije Universiteit Amsterdam. We gratefully acknowledge the comments by the participants. We extend our thanks to Eric The same series as in subfigure (a), but now using detrended variables. (c) The log per capita CO 2 emissions time series for Belgium over time. (d) The residual sum of squares (RSS) for the nonlinear model specification y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x θ t + u t for various values of θ. (e) The RSS as a function of θ for the flexible nonlinear trend specification y t = τ 1 + τ 2 t + τ 3 t θ + φx t + u t . (f) The relation between x t and y t after partialling out the constant, linear trend, and flexible deterministic trend. Proof of Theorem 1 In view of the identity a + b 2 = a 2 + b 2 + 2a b, we also have The proof proceeds along the lines of Lemma 1 of Andrews and Sun (2004) and Theorem 3.1 of Chan and Wang (2015) . The proofs separate into two parts. The first part uses a Taylor expansion of Q T (γ) around Q T (γ 0 ) to recover a quadratic approximation for Q T (γ) on the set Γ δ,k T ⊆ Γ . In the second part, we obtain the limiting distribution from this quadratic approximation. For any γ ∈ Γ δ,k T , the Taylor expansion of Q T (γ) around γ 0 reads whereγ is a point on the line segment connecting γ and γ 0 , and the various derivatives of Q T arė We shall show that the minimum of Q T (·) over γ ∈ Γ T (ε) is attained in the interior of Γ T (ε). The next two statements are proven later: Given claim (b), for any ε > 0, we have P Γ T (ε) ⊂ Γ δ,k T → 1 as T → ∞ because G −1 γ 0 ,T → 0. Define Clearly, γ * T is an interior point of Γ T (ε) as long as ε > 0. Subsequently select a γ ε ∈ ∂Γ T (ε), i.e. γ ε is a boundary point of Γ T (ε). From (A.2), we have where µ T a random vector with µ T = ε, andγ ε is a point on the line segment connecting γ ε and γ 0 . The pointγ * T is defined similarly. Moreover, the final equality follows from → 1 for any boundary point γ ε and the minimum of Q T must be attained at an interior point of Γ T (ε), say γ T (ε). As in Andrews and Sun (2004) where J T → ∞, satisfying the first-order conditions P Q T γ T = 0 → 1 as T → ∞. As a result, we obtain It remains to verify claims (a) and (b). We consider the sequence Γ δ,k T for k T =κ ln T andκ > 0. There exists T * > 0 such that whenever T > T * , where N κ,T (γ 0 ) is given in (S23), and κ = Cκ with some constant C > 0. Claim (a) thus holds if sup γ∈N κ,T (γ 0 ) G −1 Using the identity y t − Z t β ı N = Nτ g0 t θ 0 − β − β 0 Z t ı N + u t ı N and Lemma S3, it is easily shown that the supremum of each element is indeed o p (1). Claim (b) follows directly from the weak convergence results in Lemma S2. That is, A T −→ d J (r; γ 0 )J (r; γ 0 ) dr and b T −→ d J (r; γ 0 ) dB u (r) + B vu as T → ∞. Theorem 1 now follows from (A.3). The proof is to a large extent an application of theorem 2 in Jansson (2002). We provide the details in the Supplement. We abbreviate M = M T . By simple rearrangements, we obtain Looking at the elements of R 1,M , we conclude that R 1,M = o * p (1) if results similar to those in Lemma S3(i)-(iii) continue to hold. Two conditions need to be verified: (b1) P γ T ∈ N κ,M (γ 0 ) → 1 with N κ,M (γ 0 ) similarly defined to (S23), (b2) the stochastic order of terms remains the same when replacing Z m by Z m , conditional on the sample (x 1 , y 1 ), . . . , (x T , y T ). For condition (b1), using set inclusions similar to those below (A.3), it suffices to show γ T ∈ γ ∈ Γ : G γ 0 ,M (γ − γ 0 ) ≤κ ln M with large probability for someκ > 0. This is trivial because . All these stochastic orders are a consequence of (A.5) and a straightforward modification of Lemma S2. Overall, we have R 2,M = o * p (1). It remains to look at the leading terms in (A.4) . The elements of Ω and ∆ are always multiplicative in the construction. From S M −→ p I p+2 , (A.5), and Lemma S2, we have in probability. The last two terms in (A.7) equal B vu , because Ω + ∆ Similarly, The theorem follows after combining the limiting distribution of these leading terms through (A.4). Proof of Theorem 4 Without loss of generality, we set = 1. Subsequently, note that where the stochastic order of the remainder terms R q T ,1 -R q T ,3 follows from Lemma S3 and Theorem 1: The theorem follows from (A.9), a functional central limit theorem for linear processes, the continuous mapping theorem, and the rate requirements. Supplement to "Cointegrating Polynomial Regressions with Power Law Trends: Environmental Kuznets Curve or Omitted Time Effects?" S1 Simulation DGP used for Introduction S1 S2 Auxiliary Lemmas S1 The simulation DGPs of the introduction are based on the data for Austria, Belgium and Finland. Parameter values are (nonlinear) least squares estimates and innovations are mean-zero normally distributed random variables with a covariance matrix estimated from the residuals and ∆x t . The specific parametrization for the model with global trend is where 18.86 * * * * * 1.35 2.02 * * * * 3.68 2.65 1.88 * * * 0.10 0.38 1.46 0.83 * * 0.45 0.09 0.22 0.07 0.18 * 0.26 0.14 0.56 0.15 0.14 0.24 The simulations investigating the influence of the redundant trend follow Compared to the empirical application (and thus also DGP2 in Section 4), the main differences are the smaller N and the omission of serial correlation in both the innovations and the increments of the integrated variables. These modifications allow us to showcase the influence of the omitted global trend while not having to worry about the effects of long-run covariance estimation on statistical size. Lemma S1 (i) For a L > −1, we have sup a≥a L 1 T T t=1 t T a ≤ C. (ii) Under Assumption 2, for any a ≥ a L > − 1 2 , and any k ≥ 0, E 1 √ T T t=1 t T a (ln t) k u i,t 2 ≤ C(ln T ) 2k , i = 1, . . . , N. (iii) Under Assumption 2, for some a L and a U such that − 1 2 < a L < a U < ∞, and any k ≥ 0, (iv) If a L and a U satisfy −1 < a L < a U < ∞, and if k = 0, 1, 2, . . ., then Proof (i) This is shown in lemma 4 of Robinson (2012). (ii) Note that where we define γ i,s = E(u i,t u i,t−s ). For the given index ranges, we also have |t − s| ≤ t such that The first summation in the RHS of (S3) is bounded in view of Lemma S1(i) and ∞ s=0 |γ i,s | < ∞ due to Assumption 2(a) (cf. Appendix 3.A. in Hamilton (1994)). (iii) Using the equality t and a change in the order of summation, we find and hence For the first term in the RHS of (S4), we have E 1 ≤ C(ln T ) k by Lemma S1(ii) with a = 0. For the second term, note that To deal with the supremum of 1 + 1 s a − 1 over [a L , a U ], we define g a (x) = (1 + x) a − 1 for 0 ≤ x ≤ 1. If − 1 2 < a ≤ 1, then |g a (x)| ≤ |a|x by Bernoulli's inequality. If a ≥ 1, then convexity of g a (x) implies We conclude that |g a (x)| ≤ Cx for all a L ≤ a ≤ a U and x ∈ [0, 1]. Combining this result with (S5), we have where we used E s t=1 (ln t) k u i,t ≤ E s t=1 (ln t) k u i,t 2 1/2 ≤ Cs 1/2 (ln s) k (the steps in the proof of (ii) require a small modification to establish this) to go to the last line, and (i) to obtain the final inequality. The proof is complete since we have bounded both terms in the RHS of (S4). (iv) If we divide the integral into integration intervals of width 1 T , then we find We therefore conclude that It remains to bound the term Ic. Changing the integration variable to r = t T − s yields We subsequently derive an upper bound for the integrand using an approach which mimics the derivations in (D.14) and (D.15) in Robinson (2012) . For any 2 T ≤ ≤ 1 (such that 0 < s/ ≤ 1 2 ), we have by the triangle inequality and the fact that ( − s) a = ( − s) a . For IIa similar arguments as those found below (S5) give 1 − (1 − x) a ≤ Cx, and hence since | ln | ≤ | ln T | for all 2 T ≤ ≤ 1. For IIb we first note that 1 2 ≤ 1 − s/ < 1 and therefore (1 − s/ ) a < (1 − s/ ) −1 ≤ 2. Moreover, we use the factorization p n − q n = (p − q) n−1 j=0 p n−1− j q j to obtain 21 because 1/T ≤ − s < 1 and thus | ln( − s)| ≤ ln T . Combining all previous results for IIb gives IIb ≤ C a s ln T k−1 ≤ C a L −1 ln T k−1 1 T . Since 2 T ≤ ≤ 1, we use the bounds on IIa and IIb to bound the integrand of (S7) as follows: The asymptotic order of T t=1 t T a L −1 relies on the values of a L . We distinguish three cases: (1) if It is seen that Ia, Ib, and Ic converge to zero as T → ∞. The proof follows from (S6). 21 For any x > −1, we have the inequality Let Assumption 2 hold. For any a such that − 1 2 < a L ≤ a ≤ a U < ∞, any j ∈ {1, 2, . . . , p i }, i ∈ {1, 2, . . . , N}, and k ∈ {0, 1, 2, . . .}, as T → ∞, we have: where the third line is obtained using integration by parts of the mean square Riemann-Stieltjes integral, c.f. theorem 2.7 in Tanaka 1/T f T (r) − f (r) dB u i (r) are asymptotically negligible. These quantities are zero mean so it suffices to show that their variances vanish as T → ∞. By the isometry property and steps similar to those above (S7), we have ds. (S17) Now let ∈ 1 T , 2 T , . . . , 1 and recall that 0 ≤ s ≤ 1 T (hence also 0 ≤ s ≤ 1). Using the triangle inequality, the expression in absolute values can be bounded as (ln( + s)) k + a (ln( + s)) k − (ln ) k = a (1 + s/ ) a − 1 |ln( + s)| k + a (ln( + s)) k − (ln ) k = IIc + IId. (S18) By the inequality |g a (x)| ≤ Cx below (S5) and the fact that |ln( + s)| ≤ | ln | + | ln(1 + s/ )| ≤ ln T + s/ , we obtain IIc ≤ C a L s ln T + s k ≤ C a L −1 (ln T ) k 1 T . Moreover, the factorisation p n − q n = (p − q) n−1 j=0 p n−1− j q j yields By combination of the bounds on IIc and IId, we conclude that ( + s) a (ln( + s)) k − a (ln ) k ≤ C a L −1 (ln T ) k 1 T and arrive at the following upper bound on the RHS of (S17): The RHS of (S20) will go to zero as T → ∞, thereby establishing that is also asymptotically negligible. The proof of part (ii) is now complete. (iii) We have Given the CMT and X i,T −→ d B v i , term IIIa will converge weakly to 1 0 f (r)B j v i (r)dr if we can show that x → 1 0 f (r)x j (r)dr is a continuous functional. Let x, y ∈ D[0, 1]. Hölder's inequality implies Continuity of the functional now follows from (S22). If we apply the Cauchy-Schwartz inequality to IIIb, then we find We conclude that IIIb = o p (1). Now combine the limiting results for IIIa and IIIb to complete the argument. For any κ > 0, define Assume N is fixed and T → ∞. Let k 1 , k 2 be any nonnegative integers. Let Assumption 2 hold. Proof We only show (i), (iii), (v) and (vi). The proof of the remaining results is similar and thus omitted. (i) For any γ ∈ N κ,T (γ 0 ), by the triangular inequality and the mean-value theorem (MVT), where t |θ| ≤ T |θ−θ 0 | = exp (|θ − θ 0 | ln T ) ≤ C whenever T is sufficiently large. We obtain the first result due to Lemma S1(i) and (ln T ) k T θ 0 +1/2 = o(1) for any k ≥ 0. (iii) By Lemma S1(i), Lemma S2(iii), and the MVT, where the term o p (1) is uniform over γ ∈ N κ,T (γ 0 ). (v) By Part (i) and Lemma S2(iii), (vi) Using the MVT and Lemma S2(ii), we obtain The proof is completed. Proof We write ∆ T ≡ ∆ T ( γ T , b T ) and Ω T ≡ Ω T ( γ T , b T ) to make their dependence on the parameter estimator γ T and bandwidth b T explicit. Changing the summation indices, we can express the one-sided long-run covariance estimator as Clearly, it suffices to study the asymptotic behavior of Σ T ( γ T ) and Γ T ( γ T , b T ). As the lower right subblock of V t+i (γ)V t (γ) equals v t+i v t (no parameter estimation uncertainty here), the consistency result for this subblock follows from the properties of {v t } in Assumption 2, the kernel requirements in Assumption 3, and an application in Theorem 2 of Jansson (2002) . We proceed to the upper left subblocks of Σ T ( γ T ) and Γ T ( γ T , b T ). If the residuals are close enough to the true innovations, then the results from Theorem 2 of Jansson (2002) again applies. It suffices to show Using Lemmas S1-S2 and Theorem 1, the following key result follows immediately This implies T −1 T t=1 u t u t − u t u t = O p T −1/2 ln T and where the final step is due to lemma 1 of Jansson (2002) . Clearly, both terms are asymptotically negligible under the assumption T −1/2 b T ln T → 0 as T → ∞. The limits of the two remaining subblocks of Σ T ( γ T ) and Γ T ( γ T , b T ) are derived similarly. Invoking Theorem 1, we have It remains to show that the quantity in the RHS is normally distributed with a mean and variance as in (3.1) of the main paper. Consider an arbitrary vector c = [c 1 , c 2 ] and define Gaussianity is preserved under mean square integration (see, e.g., section 4.6 in Soong (1973)) and we proceed to the mean and variance of A c . From (4.190) in the same reference, we get E(A c ) = Ω 1/2 uu c 1 τ 0 r θ 0 ln(r) + c 2 r θ 0 dE W u = 0. Moreover, (2.16) in Tanaka (2017) yields Var A c = Ω uu c 1 τ 0 r θ 0 ln(r) + c 2 r θ 0 2 dr = Ω uu c τ 0 r θ 0 ln(r) 2 dr τ 0 r 2θ 0 ln(r)dr τ 0 r 2θ 0 ln(r)dr r 2θ 0 dr c. Our choice of c was arbitrary and thus τ 0 r θ 0 ln(r)dB u r θ 0 dB u ∼ N 0, Ω uu τ 0 r θ 0 ln(r) 2 dr τ 0 r 2θ 0 ln(r)dr τ 0 r 2θ 0 ln(r)dr r 2θ 0 dr . Use r θ 0 ln(r) 2 dr = 2 (2θ 0 +1) 3 , r 2θ 0 ln(r)dr = − 1 (2θ 0 +1) 2 , and basic linear algebra to recover the result. We consider N = 1 and test H 0 : φ 2 = 0 versus H a : φ 2 0 with FMOLS. Specifically, we generate the data according to where x t = t s=1 v s . The chosen parameter values are θ = 2, τ = [τ 1 , τ 2 , τ g ] = [7, 0.05, −5 × 10 −4 ] , and φ = [φ 1 , φ 2 ] = [5, 0] . These parameter values are representative. The disturbance vector [u t , v t ] is generated from the VAR(1) specification 22 We construct the autoregressive matrix A along the following two steps: (1) generate a (2 × 2) random matrix U from U[0, 1] to construct the orthogonal matrix H = U (U U ) −1/2 , and (2) compute A = HLH with L = diag[0.9, 0.7]. As shown in Figure S1 , for sample sizes as large as 15, 000, the empirical size of the feasible FMOLS estimator seems to stabilisze at 11% whereas the infeasible estimator FMOLS(θ 0 ) yields an empirical size close to 5%. These results indicate poor finite sample performance of FMOLS or possible even a lack of asymptotic validity. 22 We start the VAR recursions from u 0 v 0 = 0 and use a presample of 50 observations to reduce the influence of these initial values. S6.1 Empirical size for DGP1 when θ = 0.8 Table S1 : The empirical size (in %) of the single-equation t-tests H 0 : β 2,1 = 0 and the joint Wald tests for H 0 : β 2,1 = . . . = β 2,N = 0 with β 2,i denoting the coefficient in front of x 2 i,t . The Monte Carlo results are based on: simulated inference with θ estimated by NLS (SimNLS), simulated inference with known θ = 0.8 (SimNLS(θ 0 )), and two Fully Modified estimators for systems developed by Wagner et al. (2020) with known θ = 0.8 (FM-SOLS(θ 0 ) and FM-SUR(θ 0 )). S6.2 Empirical size for DGP1 when θ = 1.8 Table S2 : The empirical size (in %) of the single-equation t-tests H 0 : β 2,1 = 0 and the joint Wald tests for H 0 : β 2,1 = . . . = β 2,N = 0 with β 2,i denoting the coefficient in front of x 2 i,t . The Monte Carlo results are based on: simulated inference with θ estimated by NLS (SimNLS), simulated inference with known θ = 1.8 (SimNLS(θ 0 )), and two Fully Modified estimators for systems developed by Wagner et al. (2020) with known θ = 1.8 (FM-SOLS(θ 0 ) and FM-SUR(θ 0 )). (θ0) The parameters of simulation DGPs have been selected according to the following general procedure. Step 1: Load the data and estimate the model corresponding to the specification under H 0 . The resulting coefficients γ T and residual series { u t } are stored. (b) Set ∆x 0 = 0, construct the increments of the integrated explanatory variables through the recursion ∆x t = A (2) ∆x t−1 + ξ (2) t , and compute the partial sums x t = t s=1 ∆x s . Given the simulated innovations and simulated integrated regressors, it remains to use γ T to obtain the simulated dependent variables. Three remarks follow. First, we explicitly choose individual VAR(1) models for { u t } and { ∆x t } rather than a single 2N-dimensional VAR(1) for the joint vector. Otherwise, with N = 6 and T = 145 in the data, the number of parameters in the autoregressive matrix would be 12 2 = 144 which is rather close to the length of the data series. Similarly, the selection of the VAR(1) specification results from a trade-off between model parsimony and a simulation DGP with serial correlation. Second, we follow the literature and rebuild the integrated explanatory variables as random walks without drift. Third, the specific values of all parameters (rounded to 2 decimals) are reported in the next subsections. The estimated parameters of the model are The results for the VAR(1) specifications follow The estimated model specification is The VAR(1) dynamics in the innovations and increments are governed by The simulation experiments regarding the performance of the KPSS test are based on the same specification as DGP2(a). S8.1 Unit root tests Table S3 : The t-statistics for the ADF and DF-GLS unit root tests. The columns with header 'const ' and 'const & trend' refer to the inclusion of only an intercept or both intercept and linear trend. Rejection of the unit root hypothesis at a 10% and 5% level are indicated with one and two stars, respectively. Figure S2 : Overview graphs for Austria over 1870-2014. (a) log(GDP) versus log(CO 2 ) (both per capita). (b) As subfigure (a) but using detrended variables. (c) The log per capita CO 2 emissions time series for Austria. (d) The residual sum of squares (RSS) for the nonlinear model specification y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x θ t + u t for various values of θ. (e) The RSS as a function of θ for the flexible nonlinear trend specification y t = τ 1 +τ 2 t +τ 3 t θ +φx t +u t . (f) The relation between x t and y t after partialling out the constant, linear trend, and flexible deterministic trend. The residual sum of squares (RSS) for the nonlinear model specification y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x θ t + u t for various values of θ. (e) The RSS as a function of θ for the flexible nonlinear trend specification y t = τ 1 +τ 2 t +τ 3 t θ +φx t +u t . (f) The relation between x t and y t after partialling out the constant, linear trend, and flexible deterministic trend. S8.4 RSS(θ) for y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x θ t + u t (e) (f) Figure S4 : The residual sum of squares (RSS) for the nonlinear specification y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x θ t + u t for various values of θ. This replicates Figure 4 (d) of the main paper for all countries in the data set. Continuation of Figure S4 . Results of a more in-depth univariate analysis are collected in this section. We look at models (M1 * )-(M4 * ) as listed in Table S5 . Model Specification (M1 * ) y t = τ 1 + τ 2 t + φ 1 x t + φ 2 x 2 t + u t (M2 * ) y t = τ 1 + τ 2 t + τ 3 t 2 + φ 1 x t + φ 2 x 2 t + u t (M3 * ) y t = τ 1 + τ 2 t + τ 3 t θ + φ 1 x t + φ 2 x 2 t + u t (M4 * ) y t = τ 1 + τ 2 t + τ 3 t θ + φ 1 x t + u t All three models are of the form: y t = τ 1 + τ 2 t + τ 3 t θ + φ 1 x t + φ 2 x 2 t + u t . Model (M1 * ) is the specification above with τ 3 = 0 and forces all nonlinearities to be captured through x 2 t . Specifications (M2 * ) and (M3 * ) include deterministic nonlinear time trends. For model (M2 * ), we allow for τ 3 0 but fix θ = 2. Model (S34) without further restrictions is referred to as (M3 * ). In the latter model, the NLS estimator for θ is computed by a grid search over the values Θ = [0.05, 0.95] ∪ [1.05, 10] and simulated inference is used (see Section 3.2 of the main paper). Table S6 illustrates how increasingly flexible nonlinear deterministic trends affect the parameter estimates for φ 1 and φ 2 . Judging exclusively by the signs of φ 1 and φ 2 , the EKC exists for 17 out of 18, 9 out of 18, and 8 out of 18 countries for (M1 * ), (M2 * ), and (M3 * ), respectively. Moreover, the significance of squared log per capita GDP (read: φ 2 ) reduces when nonlinear deterministic time trends are included. For model (M3 * ), φ 2 is never significantly different from zero at a 10% level and evidence in favour of EKC becomes rather meagre. The results of the univariate KPSS tests for these models can be found in Table S6 under "Stationarity tests". In general, the cointegrating relations seem well-specified except maybe for Belgium, Denmark, and UK. The insignificance of φ 2 in model (M3 * ) suggests a final model specification, namely y t = τ 1 + τ 2 t + τ 3 t θ + φ 1 x t + u t . (M4 * ) Model (M4 * ) specifies a linear cointegrating relation around a flexible time trend and does not incorporate nonlinear effects in log per capita GDP. 23 That is, the model specification does not allow for an EKC. As before, we check parameter estimates and test for stationarity of the error terms (the columns labeled "(M4 * )" in Table S6 ). Some remarks concerning this final model specification are: 1. For Belgium, the fitted model reads y t = −0.049 + 0.0063t − 6.131 × 10 −6 t 2.603 + 1.006x t +û t . The flexible power on the linear trend is estimated to be θ = 2.603 resulting in nonlinear behaviour over time. Moreover, the negative coefficient in front of t 2.603 provides a contribution that is sloping down over time. If time effects are ignored, then a 1% increase in GDP will lead to an estimated 1.006% increase in fossil-fuel CO 2 emissions. 2. The outcomes of the KPSS test do not point towards a misspecified cointegrating relation (Table S6 ). The flexible deterministic trend is generally sufficient to describe the nonlinear behaviour of the (univariate) log per capita CO 2 emissions over time, that is, squared log per capita GDP is not needed in the univerariate models. Visual proof is found in Figures 4(a) , 4(b) and 4(f) where the incorporation of increasingly flexible time effects is seen to remove any apparent nonlinear relationship between log per capita GDP and CO 2 emissions. Visualisations of the model fits are available in Figures S7-S11. 23 Model specification (M4 * ) has the additional advantage of being invariant to the possible presence of a drift component in log per capita GDP, also see footnote 13 of the main text. S26 Figure S9 : The residual series for each country under model specification (M2): y S27 Figure S10 : The residual series for each country under model specification (M3): y S28 Figure S11 : The residual series for each country under model specification (M4): y S8.6 Nonparametric kernel estimator and linear fit Figure S12 : The 95% (point-wise) confidence intervals of the non-parametric kernel estimate for the relationship between GDP and CO 2 emissions (blue) after removal of the country-specific and joint flexible deterministic trends. The red dotted line is the linear fit. Results are based on the full sample. Figure S13 : The 95% (point-wise) confidence intervals of the non-parametric kernel estimate for the relationship between GDP and CO 2 emissions (blue) after removal of the country-specific and joint flexible deterministic trends. The red dotted line is the linear fit. Results are based on observations after World War II. Calculus: A Complete Course The investigation of Environmental Kuznets Curve hypothesis in the advanced economies: The role of energy prices Estimation and inference with weak, semi-strong, and strong identification Heteroskedasticity and autocorrelation consistent covariance matrix estimation Adaptive local polynomial whittle estimation of long-range dependence Testing linearity using power transforms of regressors Combining p-values to test for multiple structural breaks in cointegrated regressions GLS estimation and confidence sets for the date of a single break in models with trends The Environmental Kuznets Curve: seeking empirical regularity and theoretical structure Nonlinear regressions with nonstationary time series Note: Asterisks denote rejection of the null hypothesis at the * * * 1%, * * 5%, and * 10% significance level Note: Asterisks denote rejection of the null hypothesis at the * * * 1%, * * 5%, and * 10% significance level Proof For r ∈ (0, 1], we define f (r) = r a (ln r) k . Two partial sum processes are defined as S i,T (r) = and hencewhere we used the fact that S i,T (·) is piecewise constant. In view of Assumption 2, we can extend suitably extend the probability space and have the following uniformly strong approximation of the partial sum process S i,T (see for example page 562 of Phillips (2007)):for q > 2. Continuing from (S13), this uniformly strong approximation gives Figure S1 : The empirical size of feasible and infeasible FMOLS estimators for a large range of sample sizes. We comment on the asymptotic properties of the FMOLS estimator when N = 1. To shorten notation, the subscript 'i' in x i,t , y i,t and p i will be omitted. We analyse the asymptotic properties of D θ 0 ,T. . , x p 1 1,t ] , and y + t and A * are second-order bias corrections. That is,vv Ω vu . We now investigate how the estimation of θ affects the limiting distribution of the FMOLS estimator. By straightforward linear algebra manipulations, we findProof (i) We can always add and subtract such that the LHS of (i) readsLemma S2(iii) implies that the first term in the RHS of (S29) converges to j(r; θ 0 ) j(r; θ 0 ) dr. It remains to show that the term in parenthesis vanishes.the Cauchy-Schwarz inequality, we havewhere we used the mean-value theorem and Lemma S1(i). The claim follows. (ii) Ω uv and Ω vv consistently estimate Ω uv and Ω vv , respectively (Theorem 2). It therefore suffices to look at D −1where BThe typical elements in the vector on the RHS are of the form 1x it √ T j τ g,0 t θ T − S11 t θ 0 . We show that both contributions are O p (ln T ). By the mean-value theorem and Lemma S1(i),Similarly, from the mean-value theorem and Cauchy-Schwartz inequality, we see thatFrom (S32) and (S33) we conclude that D −1. The absolute value of the nonzero element can be bounded as follows(v) By similar steps as before, and invoking Theorem 2, it is easy to show that it suffices to boundThis establishes (v).The currents upper bounds in the lemma above suggest that the RHS of (S28) does not converge to a Gaussian mixture limiting distribution. The problematic expression is Lemma S4(iii). That is, if θ is estimated, then z t θ T − z t θ 0 does not convergence sufficiently fast to zero to obtain the standard stochastic integral.