key: cord-0548657-j1opzzq8 authors: Barreto-Souza, Wagner; Chan, Ngai Hang title: Nearly Unstable Integer-Valued ARCH Process and Unit Root Testing date: 2021-07-16 journal: nan DOI: nan sha: 1b87a31b5a53e432f35589c82e17f7b11156963b doc_id: 548657 cord_uid: j1opzzq8 This paper introduces a Nearly Unstable INteger-valued AutoRegressive Conditional Heteroskedasticity (NU-INARCH) process for dealing with count time series data. It is proved that a proper normalization of the NU-INARCH process endowed with a Skorohod topology weakly converges to a Cox-Ingersoll-Ross diffusion. The asymptotic distribution of the conditional least squares estimator of the correlation parameter is established as a functional of certain stochastic integrals. Numerical experiments based on Monte Carlo simulations are provided to verify the behavior of the asymptotic distribution under finite samples. These simulations reveal that the nearly unstable approach provides satisfactory and better results than those based on the stationarity assumption even when the true process is not that close to non-stationarity. A unit root test is proposed and its Type-I error and power are examined via Monte Carlo simulations. As an illustration, the proposed methodology is applied to the daily number of deaths due to COVID-19 in the United Kingdom. First-order nearly unstable continuous autoregressive processes have been well explored in the literature, see for example Chan and Wei (1987) , Phillips (1987) , Chan, Ing and Zhang (2019) , and the references therein. In these works, it is assumed that the model approaches the non-stationarity region as the sample size increases. More specifically, a nearly unstable continuous process {Y where {η t } t∈N is a white noise and ρ n = 1 − b/n, for b > 0. In the past few years, nearly unstable discrete processes have emerged based on the INteger-valued AutoRegressive (INAR) approach (McKenzie, 1985; Al-Osh and Alzaid, 1987) . The first attempt on this subject was due to Ispány, Pap and Van Zuijlen (2003) . More specifically, a nearly unstable INAR (1) process {X t } t∈N is defined by where • is the thinning operator proposed by Steutel and van Harn (1979) , given by α n • X (n) j t } j,t∈N iid ∼ Bernoulli(α n ), for α n ∈ (0, 1), and { (n) t } t∈N is a sequence of independent and identically distributed (iid) random variables with (n) t being independent of the counting series {B (n) j k } j∈N for all k ≤ t, for t ∈ N. These authors assumed that α n approaches 1 (non-stationarity) when n → ∞ as given in Chan and Wei (1987) in the continuous context. By assuming µ ≡ E( t ) is known, the conditional least squares (CLS) estimator of α n was explored by Ispány, Pap and Van Zuijlen (2003) . They showed that, under nearly non-stationarity and assuming finite second moment for t , the CLS estimator weakly converges to a normal distribution at the rate n 3/2 . Other related works dealing with nearly unstable INAR (Galton-Watson/branching) processes are due to Wei and Winnicki (1990) , Winnicki (1991) , Ispány, Pap and Van Zuijlen (2005) , Rahimov (2007) , Rahimov (2008) , Drost, Van Den Akker and Werker (2009), Rahimov (2009) , Barczy, Ispány and Pap (2011) , Ispány, Körmendi and Pap (2014) , Barczy, Ispány and Pap (2014) , Guo and Zhang (2014) , and Barczy, Körmendi and Pap (2016) . Practical situations demonstrating evidence of a nearly unstable INAR model are discussed for instance by Hellström (2001) . Another popular way for dealing with count time series data is the INteger-valued Genenalized AutoRegressive Conditional Heterokedastic (INGARCH) models by Ferland, Latour and Oraichi (2006) , Fokianos, Rahbek and Tjøstheim (2009) , Fokianos and Fried (2010) , Zhu (2011) , Fokianos and Tjøstheim (2011) , Zhu (2012) , Christou and Fokianos (2015) , Gonçalves et al. (2015) , Davis and Liu (2016) , Silva and Barreto-Souza (2019) , Weiß et al. (2020) , which constitute in some sense an integer-valued counterpart of the classical GARCH models by Bollerslev (1986) . The INGARCH methodology is the focus of this paper. Like the existing literature on nearly unstable continuous and INAR processes that assumes first-order autoregressive dependence, in this paper we consider the first-order autoregressive 2 version of the INGARCH approach, which is known as INARCH (1) The paper is organized as follows. In Section 2, the NU-INARCH model is introduced and a fluctuation theorem is established, which involves the Cox-Ingersoll-Ross diffusion process. The asymptotic distribution of the CLS estimator for the correlation parameter is derived in Section 3 under the nearly unstable and stationarity assumptions. Section 4 provides simulated results about the asymptotic distribution of the CLS estimator under both nearly unstable and stationary approaches and also compares them in terms of confidence interval coverages. A unit root test for the INARCH process is proposed in Section 5 and its performance is evaluated via Monte Carlo simulations. An empirical application about the daily number of deaths due to COVID-19 in the United Kingdom, which exhibits a nearly unstable/non-stationary behavior, is provided in Section 6. Concluding remarks and future research are addressed in Section 7. In this section, we define the nearly unstable INARCH process and obtain its weak convergence (under a proper normalization) in the space of the non-negative càdlàg functions endowed with the Skorokhod topology. Definition 2.1. We say that a sequence {X (n) t } t∈N is a first-order nearly unstable integer-valued ARCH process (in short NU-INARCH) if 0 }, β > 0, and α n = 1 − γ n n , with lim n→∞ γ n = γ > 0, and X (n) 0 = κ ∈ N (constant starting value). Remark 2.1. For the nearly unstable INARCH model defined above, we have that corr(X The parameterization of α n in (2) was first proposed by Chan and Wei (1987) and subsequently used in Ispány, Pap and Van Zuijlen (2003) . In the next proposition, we provide the mean, variance, and autocorrelation function of the NU-INARCH process. These results will be important to establish the proper normalization in order to obtain a non-trivial limit for the counting process. t } t∈N be a nearly unstable INARCH process. Then, its marginal mean and variance, and autocorrelation function are given respectively by t−1 ). By using recursion t times, we obtain the result for the marginal mean. For the variance, it follows that Finally, for k, t ∈ N 0 , the autocorrelation function becomes where we have used in the third equality the fact that E(X t+k−1 for k ≥ 1. From Proposition 2.2, we have that E(X nt ) ≈ βγ −2 n 2 (1− e −γt ) 2 /2 = O(n 2 ). We then define the normalized process X (n) (t) ≡ X (n) nt /n and obtain that X (n) (t) = O p (1), for t ≥ 0. In the following theorem, we establish the weak convergence of the process {X (n) (t); t ≥ 0} as n → ∞. We introduce some notation before presenting such a result. Denote by D + [0, ∞) the space of the non-negative càdlàg (right continuous with left limits) functions on [0, ∞) and C ∞ c [0, ∞) the space of infinitely differentiable functions on [0, ∞) having compact supports. Theorem 2.3. The stochastic process {X (n) (t); t ≥ 0} weakly converges in D + [0, ∞) endowed with the Skorokhod topology to a diffusion process {X (t); t ≥ 0} given by the solution of the stochastic differential equation and Remark 2.4. The process {X (t); t ≥ 0} appearing in Theorem 2.3, Equation (3), is known in the literature as the Cox-Ingersoll-Ross (CIR) process (Cox, Ingersoll and Ross, 1985) . x ) . From Theorem 6.5 in Chapter 1 and Corollary 8.9 in Chapter 4 of Ethier and Kurtz (1986) , to obtain the desired result, it is enough to show that where h (·) and h (·) denote the first and second derivatives of h(·), respectively. For Z (n) x = x, we have that By combining (5) and (6), we obtain that Note that Equation (7) also holds for Z (n) x = x. Further, we can write We now use the Equations (7) and (8) to express n (x) as follows: We will show that lim n→∞ sup x∈En | (j) n (x)| = 0, for j = 1, 2, 3. This result, Equation (9), and the triangular inequality imply that (4) holds and therefore conclude the proof of the theorem. To show the case j = 1, we argue as in the proof of Theorem 3.1 in Chapter 9 of Ethier and Kurtz (1986) . Then, the result follows by showing that lim n→∞ | (1) These results give us that the right-hand side of (10) goes to 0 as n → ∞. We obtain the same conclusion when x − x) via its characteristic function as follows: ) . Hence, the integrand in (1) n (x n ) is bounded above by an integrable random variable. Further, this integrand converges in probability to 0 since Z (n) x −x n p −→ 0. We then apply the Dominated Convergence Theorem to conclude that lim n→∞ | (1) as n → ∞. In a similar fashion, for j = 3, it can be shown that lim n→∞ sup x∈En | n (x)| = 0, which concludes the proof. In this section, we provide the asymptotic distribution of the conditional least squares estimator of α n for the nearly unstable INARCH process. The parameter β is assumed to be known. This can be seen as a nuisance parameter since our main interest relies on the parameter α n that controls the dependence in the model. In the empirical illustration, we discuss how to deal with the unknown β case. The CLS estimator of α is obtained by minimizing the Q-function given by Hence, we obtain explicitly the CLS estimator of α, say α n , which is given by We begin by deriving the asymptotic distribution of α n under the stationary assumption, where we denote the count time series by {X t } t∈N (no need for the superscript (n)). This case will be contrasted to the nearly unstable INARCH process through simulation in the following section. Theorem 3.1. Assume that X 1 , . . . , X n is a trajectory from a stationary Poisson INARCH(1) model, that is α n = α < 1. Then, the CLS estimator α n given in (11) satisfies Proof. From Fokianos, Rahbek and Tjøstheim (2009), we have that {X t } is strictly stationary and ergodic since α < 1. Hence, we can use Theorem 3.2 from Tjøstheim (1986) to establish the asymptotic normality of the CLS estimator α n . The other conditions necessary to obtain this weak convergence can be straightforwardly checked in our case and therefore are omitted. Applying this theorem, we get that the asymptotic variance, say σ 2 , assumes the form for the marginal moments of a Poisson INARCH(1) model are given in Weiß (2010) . Using these results k , for t ∈ N 0 and s ≥ 0, where x denotes the integer-part of x ∈ R. Like in the nearly unstable INAR process by Ispány, Pap and Van Zuijlen (2003) , we can express α n − α n as In the following lemma, we provide the asymptotic behavior of the autocovariance function of the process {W (n) (s); s ≥ 0}; note that E(W (n) (s)) = 0. This will be important to identify the proper normalization of α n − α n in (12) yielding a non-trivial weak limit. , and a n ≈ b n denoting that lim n→∞ a n /b n = 1 for real sequences {a n } and {b n }. Proof. It is straightforward that E(W (n) (s)) = 0 and cov(W k−1 ), where the last equality follows from the expression of the covariance given in Proposition 2.2. After using the expression of the variance given in that lemma, we obtain that Var(W (n) From the above results and Proposition 2.2, we obtain that Lemma 3.2 and Theorem 2.3 give us that α n − α n = O p (n −1 ). We now are able to establish the asymptotic distribution of the CLS estimator α n under the nearly unstable INARCH process as follows. Theorem 3.3. Let {X (t); t ≥ 0} be the diffusion process given in (3). Then, the CLS estimator α n satisfy the following weak convergence as n → ∞, where dW(t) = X (t)dB(t), for t > 0, with W(0) = 0. Proof. Define W (n) (s) = W (n) (s)/n, for s > 0. We have that where both numerator and denominator have the same order of magnitude O p (n 2 ). For s > 0, it follows that and then W (n) (s) can be expressed by Define the functions Φ n (n = 1, 2, . . .) and Φ mapping D + [0, ∞) into D(R + , R 2 ) as Φ n (x)(s) = x(s), follows that (X (n) (s), W (n) (s)) = Φ n (X (n) )(s). Using the fact that the CIR process has almost sure continuous trajectories and similar arguments given in the proof of Proposition 4.1 of Ispány, Pap and Van Zuijlen (2003), we obtain that Φ n (X (n) ) weakly converges to Φ(X ) as n → ∞. In particular, we have that W (n) (s) weakly converges to W(s) = X (s) + γ s 0 X (u)du − βs. From the definition of X , we have that γ s 0 X (u)du = −X (s) + βs + s 0 X (u)dB(u) and, therefore, In other words, dW(t) = X (t)dB(t). The above results and the continuous mapping theorem give us that The above arguments are straightforwardly extended to establish the joint weak convergence Then, the desired result given in (13) is obtained by applying the continuous mapping theorem. In this section, we present simulated results illustrating the behavior of the asymptotic distributions of the normalized CLS estimator under the nearly unstable and stable cases. All the numerical results of this paper were obtained by using the statistical software R (R Development Core Team, 2021). We conduct Monte Carlo simulations with 10000 replications, where we generate Poisson INARCH (1) trajectories with β = 1, α = 0.98, 0.99, 0.999, and initially a sample size of n = 500. Note that the chosen values for α here indicate nearly unstable count processes. For each replication, we compute the CLS estimate of α using (11) and then its standardized estimate as n( α n − α) and √ n( α n − α) according to the nearly unstable (Theorem 3.3) and stable/stationary (Theorem 3.1) cases, respectively. A generator from the asymptotic distribution given on the right-hand side of (13) was implemented, where the stochastic integrals are approximately evaluated via type-Riemann integrals. Hence, for instance, we can obtain its quantiles and also plot the associated density function by generating samples and then applying a non-parametric density estimator (here the Gaussian kernel is considered), which are important for what follows. We present the histograms and qq-plots of the standardized CLS estimates along with their associated asymptotic density/quantiles under the stable and nearly unstable cases in Figures 1 and 2 , respectively. From Figure 1 , it is evident that the normal approximation is not adequate and it is worsening when α gets closer to 1, which is expected since these results are based on stationarity. On the other hand, the histograms and qq-plots regarding the nearly unstable approximation given in Figure 2 show an excellent agreement between the empirical standardized estimates and the theoretical Table 1 : Empirical coverages of the 90%, 95%, and 99% confidence intervals for α based on the nearly unstable approach. Sample size n = 500. asymptotic distribution for all scenarios. A natural question is what happens when α is not close to 1. To address this point, we run additional simulations with α = 0.7, 0.8, 0.9, and the remaining settings as before. Figures 3 and 4 exhibit histograms and qq-plots of the standardized CLS estimates of α obtained from a Monte Carlo simulation for the stationary and nearly non-stationary Poisson INARCH processes. From Figure 3 , we observe some deviation from the normality even for the case α = 0.7. This is well evidenced by the qq-plots. Surprisingly, the results based on the nearly unstable methodology work quite satisfactorily even for α = 0.7. These conclusions can be drawn again in Figure 4 , where we note a good agreement between the empirical standardized CLS estimates and the theoretical asymptotic distribution derived in Theorem 3.3. All the configurations considered here are repeated again with a sample size n = 1000. Figures 5 and 6 give us the histograms and qq-plots of the standardized CLS estimates under the stable and nearly unstable Poisson INARCH processes, respectively, under the settings α = 0.98, 0.99, 0.999. The plots regarding the settings α = 0.7, 0.8, 0.9 for the stable and nearly unstable cases are reported in Figures 7 and 8 , respectively. The conclusions are quite similar to the case n = 500 for the configurations nearly to non-stationarity α = 0.98, 0.99, 0.999. Regarding the configurations where α = 0.7, 0.8, 0.9, although there is an improvement in the results based on the stationary case (compared to n = 500), deviations from the normality can still be observed. In contrast, the nearly unstable approach again works very well and provides the best outcomes. As a short conclusion, we recommend using the nearly unstable-based approach even when the fitted model may in practice not be too close to the non-stationarity region because the proposed methodology works well and perform better than the stationary-based approach. Our interest now is to evaluate the coverages of the confidence intervals based on the asymptotic results under the nearly unstable and stable assumptions. In Table 1 , we provide the empirical coverages of confidence intervals, from a Monte Carlo simulation with 10000 replications, for α with significance level at 10%, 5%, and 1% based on Theorem 3.3 (under nearly non-stationarity). The sample size is n = 500 and we consider α = 0.999, 0.99, 0.98, 0.9, 0.8, 0.7. These results show that inference on the correlation parameter using our methodology is satisfactory since the coverages are close to the nominal levels for all cases considered, even when α is not close to the non-stationarity region. In this section, we propose a statistical procedure for testing unit root in a Poisson INARCH(1) model with correlation parameter α. The null and alternative hypotheses are respectively H 0 : α = 1 and H 1 : α < 1. To this end, we consider the nearly unstable approach and the statistic n( α n − 1), which is inspired by the traditional unit root test for the continuous AR(1) model of Dickey and Fuller (1979) . Under the conditions of Theorem 3.3, we have that as n → ∞, where D γ is a random variable (depending on the parameter γ) following the asymptotic distribution given in the right-hand side of (13). We can approach the null hypothesis of interest through our methodology by taking γ → 0. In this case, the distribution of the right-hand side of (14) approaches that of D 0 , which has the associated X process satisfying the stochastic differential equation Denote by q ζ the ζ-quantile of the distribution of D 0 , for ζ ∈ (0, 1), that is P (D 0 ≤ q ζ ) = ζ. These quantiles can be obtained from Monte Carlo simulation as done in Section 4. Based on the above discussion, we propose the following decision rule for testing H 0 : α = 1 against H 1 : α < 1 with significance level at ζ × 100%: • Reject H 0 in favor of H 1 if n( α n − 1) < q ζ . To evaluate the finite-sample performance of the proposed unit root test (URT), we run a Monte Carlo simulation with 10000 replications. We set β = 1 and sample sizes n = 50, 80, 100, 200, 300, 400, 500, 1000, 2000, 5000. In Table 2 , we provide the empirical significance levels with nominal levels at 10%, 5%, and 1%. We observe that the URT is yielding the desired Type-I error even for small sample sizes (for instance, n = 50, 80). Aiming at the investigation of the test power, another Monte Carlo simulation is considered under the same setup as before and with a significance level at 5%. We consider α = 0. 999, 0.99, 0.98, 0.95, 0.9, 0.8, 0.7 and compute the proportion of rejections of the null hypothesis in each scenario. The results are presented in Table 3 We here apply the proposed methodology to the daily number of deaths due to COVID-19 in the United Kingdom from January 30, 2020, to June 4, 2021, so yielding n = 492 observations. This dataset is publicly available at the site https://coronavirus.data.gov.uk. The plot of the daily number of deaths and its associated ACF are provided in Figure 9 , which reveals a nearly unstable/non-stationary behavior. We assume that the time series comes from an NU-INARCH(1) process. The aim of this application is to illustrate that the theoretical results found in this paper can reveal the unit root behavior for a real dataset. We first need to deal with β, which is unknown and can be seen as a nuisance parameter; our primary interest in this paper relies on the correlation parameter α n . One strategy is to estimate β through the conditional maximum likelihood method, which consists in maximizing ∝ n t=2 (y t log λ t − λ t ), and then assume it known in what follows. This procedure gives β = 0.269. At the end of this application, we will evaluate such an approach by performing a small Monte Carlo simulation study. Using (11), we obtain the estimate for the correlation parameter equal to α n = 0.997, which is very close to 1. We obtain the standard error of the α n estimate (s.e.( α n )) using the asymptotic distribution stated in Theorem 3.1, which gives the s.e.( α n ) ≈ 0.014. We perform the URT proposed in Section 5 for testing the hypothesis H 0 : α = 1 against H 1 : α < 1. We obtain n( α n − 1) = −1.257 > −17.952 = q 0.05 and therefore we do not reject the null hypothesis on the unit root with significance Figure 9 : Plot of the daily number of deaths due to COVID-19 in UK and its associated ACF. level at 5%. The density function of D 0 based on Gaussian kernel and 100000 Monte Carlo replications is provided in Figure 10 along with vertical lines denoting the statistic test and the 0.05-quantile (of the D 0 distribution). The associated p-value is 0.704, which shows that we obtain the same indication by using any usual significance level. In Figure 11 , we present the count time series data and the predicted means based on the fitted NU INARCH model, which reveals a good agreement between the observed time series and the model. We conclude this application by evaluating our strategy by estimating β and assuming known. To do this, we run a small Monte Carlo simulation with 1000 replications. In each loop, we generate an NU-INARCH model with β = 0.269, α = 0.997, and n = 492 (specifications of the application), construct confidence intervals for α based on both approaches with fixed and non-fixed (estimated as done in this section and then assumed known) β, and check if they contain the "true" value. The empirical coverages of the 90%, 95%, and 99% confidence intervals under both approaches are reported in Table 4 . As can be seen from this table, the proposed solution given here in the application provides the expected nominal coverages and works even better than the fixed β case for the 90% and 95% coverages; the 99% coverages are very close to each other. A nearly unstable INARCH(1) process was introduced and weak convergence of a normalized version was established. The asymptotic distribution of the CLS estimator of the correlation parameter was derived under both nearly unstable and stable cases, which have been explored via Monte Carlo simulations. We also proposed a unit root test and checked its performance in terms of yielding the desired Type-I error and power through simulation. The nearly unstable INARCH approach was applied to the daily number of deaths due to the COVID-19 in the UK, which exhibits a non-stationary behavior. the proposed URT has provided evidence for the existence of a unit root in agreement with the descriptive analysis. We have assumed that the conditional distribution in (1) is Poisson, but the methods presented in this paper can be easily adapted for other distributional assumptions such as negative binomial or more generally mixed Poisson distributions, among others. More specifically, the very same strategy given in Proposition 2.2 and Lemma 3.2 can be employed to find the proper normalizations for the processes {X (n) nt , t ≥ 0} and {W (n) (t), t ≥ 0} in these other cases. After obtaining these results, the asymptotic distributions of the normalized count process and CLS estimator are established following the same steps as those given in Theorems 2.3 and 3.3, respectively. We also believe that extending the results for higher-order INGARCH models deserves future investigation. First-order integer valued autoregressive (INAR(1)) process Asymptotic behavior of unstable INAR(p) processes. Stochastic Processes and their Applications Asymptotic behavior of conditional least squares estimators for unstable integer-valued autoregressive models of order 2 Statistical inference for critical continuous state and continuous time branching processes with immigration Generalized autoregressive conditional heteroscedasticity Asymptotic inference for nearly nonstationary AR(1) processes Nearly unstable processes: A Prediction perspective Quasi-likelihood inference for negative binomial time series models Estimation and testing linearity for non-linear mixed Poisson autoregressions A theory of the term structure of interest rates Theory and inference for a class of nonlinear models with applications to time series of counts Distribution of the estimators for autoregressive time series with a unit root The asymptotic structure of nearly unstable nonnegative integer-valued AR(1) models Markov Processes: Characterization and Convergence Integer-valued GARCH process Interventions in INGARCH processes Poisson autoregression Log-linear Poisson autoregression Infinitely divisible distributions in integer-valued GARCH models A fluctuation limit theorem for a critical branching process with dependent immigration Unit root testing in integer-valued AR(1) models Asymptotic inference for nearly unstable INAR(1) models Asymptotic behavior of CLS estimators for 2-type doubly symmetric critical Galton-Watson processes with immigration Fluctuation limit of branching processes with immigration and estimation of the means tscount: An R package for analysis of count time series following generalized linear models Some simple models for discrete variate time series Towards a unified asymptotic theory for autoregression R: A Language and Environment for Statistical Computing Functional limit theorems for critical processes with immigration Asymptotic distribution of the CLSE in a critical process with immigration Asymptotic distributions for weighted estimators of the offspring mean in a branching process Flexible and robust mixed Poisson INGARCH models Discrete analogues of self-decomposability and stability Estimation in nonlinear time series models Estimation of the means in the branching process with immigration The INARCH(1) model for overdispersed time series of counts Softplus INGARCH models Estimation of the variances in the branching process with immigration. Probability Theory and Related Fields A negative binomial integer-valued GARCH model Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued GARCH models