key: cord-0078430-o3b18i10 authors: Mohammadi, Zohreh; Sajjadnia, Zahra; Sharafi, Maryam; Mamode Khan, Naushad title: Modeling Medical Data by Flexible Integer-Valued AR(1) Process with Zero-and-One-Inflated Geometric Innovations date: 2022-05-23 journal: Iran J Sci Technol Trans A Sci DOI: 10.1007/s40995-022-01297-3 sha: 821f776f9d467e01aaede41ae02ab2417a14f261 doc_id: 78430 cord_uid: o3b18i10 In this paper, we introduce a new stationary first-order integer-valued autoregressive process (INAR) with zero-and-one-inflated geometric innovations that is useful for modeling medical practical data. Basic probabilistic and statistical properties of the model are discussed. Conditional least squares and maximum likelihood estimators are proposed to estimate the model parameters. The performance of the estimation methods is assessed by some Monte Carlo simulation experiments. The zero-and-one-inflated INAR process is subsequently applied to analyze two medical series that include the number of new COVID-19-infected series from Barbados and Poliomyelitis data. The proposed model is compared with other popular competing zero-inflated and zero-and-one-inflated INAR models on the basis of some goodness-of-fit statistics and selection criteria, where it shows to provide better fitting and hence can be considered as another important commendable model in the class of INAR models. Time series of counts are emerging in almost every domain of applications now, be in economics, medicine, or life sciences. Some examples include the monthly cases of crimes and offenses as studied in Bakouch and Ristić (2010) , Ristić et al. (2009 Ristić et al. ( , 2012 , Bourguignon and Vasconcellos (2015) , Mamode Khan et al. (2020a) , the daily number of newly infected and deaths due to SARs-Cov 2 patients (Mamode Khan et al. 2020b) , the weekly number of syphilis cases, , the number of daily fatal road traffic accidents (Pedeli and Karlis 2011) , the tick by tick intra-day transactions of stocks (Pedeli and Karlis 2013; Sunecher et al. 2018 ) and amongst others. In such applications, the counting series are usually characterized by frequent low figures that include mainly zeros and ones and this happens mostly when the unit of the collection is at a very micro level. Likewise, the daily SARs-Cov 2 death and newly infected series in small island developing states like Barbados, Guinea-Bissau, Sao Tome consist of mainly 0's and 1's. The same remark can be made to the sex offenses, arsenic, domestic violence data that are available at http://www.forecastingprinciples. com. The interested reader may consult more examples in Li et al. (2015) and the references therein. Such excess of zeros or ones leads to overdispersion in the series. This paper, therefore, proposes an integer-valued time series model of auto-regressive nature of order 1 to model such data series but with a zero-and-one-inflated type innovation structure. McKenzie (1985) and Al-Osh and Alzadi (1987) , independently, introduced the integer-valued autoregressive (INAR) process fX t g t2N 0 with one lag using a binomial thinning operator as follows where 0 a\1, f t g t2N is a sequence of independent and identically distributed integer-valued random variables, called innovations, with t independent of X tÀk for all k ! 1, Eð t Þ ¼ l and Varð t Þ ¼ r 2 . The binomial thinning operator '''' is defined by Steutel and van Harn (1979) as a X tÀ1 ¼ P X tÀ1 j¼1 Y j , where the counting series fY j g j ! 1 is a sequence of independent and identically distributed Bernoulli random variables with PðY j ¼ 1Þ ¼ 1 À PðY j ¼ 0Þ ¼ a. From the results of Al-Osh and Alzadi (1987) , we have that a 2 ½0; 1Þ and a ¼ 1 are the conditions of stationarity and non-stationarity of the process fX t g t2N 0 , respectively. Also, a ¼ 0 (a [ 0) implies the independence (dependence) of the observations of fX t g t2N 0 . The following representation for the marginal distribution of the INAR(1) model, provided by Al-Osh and Alzadi (1987) , is expressed in terms of the innovation sequence t Modeling of INAR(1) time series based on (1) was first introduced using the Poisson marginal distribution by Al-Osh and Alzadi (1987) and McKenzie (1988) , denoted by PINAR(1). It is a simple model and is appropriate for modeling equidispersed time series data. In many practical scenarios, as discussed above, data are overdispersed. To cater for this phenomenon in counting series, Alzadi and Al-Osh (1988) (Weiß 2008; Awale et al. 2021; Huang and Zhu 2021; Weiß 2020) . However, we note that the construction of the INAR process, in addition to the self-decomposability properties, becomes simpler with assuming the distribution of the innovation series, and without compromising on the marginal distribution of the counting series (See Bourguignon et al. 2019; Livio et al. 2018) . In fact, Livio et al. (2018) confirms that such a later INAR process with the pre-specified innovation yields lower AICs than other competing INAR(1)s in Mohammadpour et al. (2018) . On the other hand, where the data set contains a large number of zeros, Jazi et al. (2012b) introduced an INAR(1) process with zero-inflated Poisson innovations and showed that the marginal distribution of the process is also zeroinflated. However, in the construction of the INAR(1) process, it is not always direct to derive the distribution of the counting series similar to the distribution of the zeroinflated innovation series. In this sense, Barreto-Souza (2015) , Bakouch and Ristić (2010) and Bourguignon et al. (2018) studied novel INAR(1) models with zero-modified geometric and zero-truncated Poisson marginal distribution, respectively, similar to the construction process in Livio et al. (2018) . Furthermore, Li et al. (2015) developed the mixed INAR(1) process with zero-inflated generalized power series innovations, while Bakouch et al. (2018) investigated the zero-inflated geometric INAR(1) process with random coefficient until recently, Sharafi et al. (2020) proposed the INAR(1) model with zero-modified Poisson-Lindley innovations. However, when a data set is subject to zero inflation along with one-inflation, the previous models are not very useful. In this research, we restrict our attention to modeling such data. Qi et al. (2019) introduced a stationary INAR(1) process with zero-and-one-inflated Poisson innovations. Also, Mohammadi et al. (2021) introduced the ZOIPLINAR(1) model which is the stationary INAR(1) model with zero-and-one-inflated Poisson-Lindley distributed innovations. The geometric distribution is one of the most important distributions used to analyze count data. Many authors such as McKenzie (1986), Ristić et al. (2009 Ristić et al. ( , 2012 , Jazi et al. (2012a, b) used geometric distribution to analyze count time series data. This fact motivated us to introduce the flexible INAR(1) model with zero-and-one-inflated geometric innovations to model count data, especially in the analyzing of the COVID-19 real data. It should be mentioned that in the COVID-19 data time series analysis based on the PACF plot, in most applications, it seems that order 1 is not suitable and the higherorder time series are needed. Recently, Foroughi et al. (2021) introduced a new portmanteau test to examine the null hypothesis H 0 : X t $ GINARð1Þ versus the alternative H 1 : X t $ GINARðpÞ for p [ 1 and a wide group of INAR processes, called generalized INAR. They developed some portmanteau test statistics to check the adequacy of the fitted model. In this paper, we use the above test statistics to check the adequacy of our introduced model which is applied to the practical data example. The paper is organized as follows. In Sect. 2, we introduce and construct a flexible INAR(1) model and obtain some of its statistical and conditional properties. Section 3 is devoted to parameter estimation of the model which is included two estimation methods, maximum likelihood, and conditional least square estimators. In Sect. 4, we present some simulation experiments and real-life data applications to assess the performance of the proposed zero-and-one-inflated INAR model. In this section, we introduce a flexible INAR(1) process with zero-and-one-inflated geometric-distributed innovations denoted by INARZOIG(1) and present some of its properties. Based on the Eq. (1), we define the INARZOIG(1) as follow: where 0\1 and the innovation process f t g is said to have zero-and-one-inflated geometric distribution, denoted by ZOIGð/ 0 ; / 1 ; hÞ, with the following probability mass function (pmf), The parameter h is the mean of the traditional geometric distribution and the parameters / 0 and / 1 denote the unknown proportions for incorporating extra zeros and ones than those allowed by the considered a traditional geometric distribution, respectively. Also, t is independent of X s for all t [ s and it is independent of the counting series contained in the binomial thinning operator ''.'' Based on Du and Li (1991) and Dion et al. (1995) , it can be easily shown that this process is stationary if and only if 0\1. This process is reduced into INARZIG(1) when / 1 ¼ 0 and INAROIG(1) when / 0 ¼ 0, respectively. In the following proposition, some moments and conditional moments of the INARZOIG(1) process are summarized for the coming use. Proposition 1 Let fX t g be the process defined by (3). Then The proof of Proposition 1 is similar to Theorem 1 in Qi et al. (2019) , we omit the details here and refer the reader to Qi et al. (2019) . Using conditional mean is one of the most common techniques for forecasting time series processes. In the next proposition, conditional mean and variance of INAR-ZOIG(1) process is obtained. Proposition 2 For INARZOIG(1) process, the (h ? 1)step ahead forecast which is conditional mean, and the conditional variance arê and respectively. Proof The proof of Proposition 2 is given in Appendix.h It is clear that EðX tþh jX tÀ1 Þ ! l ð1ÀaÞ as h ! 1, which is the unconditional mean of the process. Also, the (h ? 1)step ahead conditional variance converges to al þr 2 1Àa 2 ¼ r 2 as h ! 1. According to Proposition 1, the Fisher index of dispersion for the model can be calculated as where FI ¼ 1 À l þ 2 h 2 / 2 l , then the dispersion of the INARZOIG(1) process is similar to the dispersion of its innovation process, i.e., it is overdispersed (underdispersed) if the innovations f t g is overdispersed (underdispersed). t is overdispersed if h [ / 1 ffiffiffiffiffi ffi 2/ 2 p À/ 2 , it is underdispersed if 0\h\ / 1 ffiffiffiffiffi ffi 2/ 2 p À/ 2 and it is equidispersed if h ¼ / 1 ffiffiffiffiffi ffi 2/ 2 p À/ 2 . Since the model (3) forms a stationary discrete-time Markov chain, the transition probabilities obtained as (see, e.g., Weiß 2008): where Pð t ¼ jÞ is the pmf of f t g defined by (4) and i, j = 0, 1, .... Hence, the marginal probability function of X t of INARZOIG(1) is obtained as: Also, the joint probability of the processes using the firstorder dependence can be calculated as: Mood (1940) presented a definition of the number of the ''succession'' of similar events preceded and succeeded by different events, and called it ''the Run.'' In this section, we find the expected length of the runs of zeros and the lengths of the runs of ones for the INARZOIG(1) process. Theorem 1 The expected length of the runs of zeros for the INARZOIG(1) process is and the expected length of the runs of ones for the INAR-ZOIG(1) process is Proof The zero-to-zero transition probability for the INARZOIG(1) process is obtained as: Therefore, the transition probabilities from zero to nonzero for the INARZOIG(1) process can be obtained as Since the run length of zeros is defined as the number of zeros between two nonzero values, it can be shown that it follows from a geometric distribution with the parameter p à , and hence, the expected run length of zeros in the process is 1 p à . The expected run length of ones can be obtained similarly. h The expected length of the runs of zeros is independent of a. If / 0 ¼ 0 or / 1 ¼ 0, we obtain the expected length of the runs for the INAROIG(1) or INARZIG(1) process, respectively. Theorem 2 The proportion of zeros in the INARZOIG(1) process is given by and the proportion of ones in the INARZOIG(1) process is Proof Using part (f) of Proposition 1 and based on the following relationship between the probability generating function (pgf) and pmf, where w ðkÞ X t ð:Þ denotes the kth derivative of the pgf w X t ð:Þ, the proof is completed by calculating the following statements. Let X ¼ ðX 1 ; . . .; X n Þ be observations from the model (3) and In the study of integer-valued time series, different estimation methods are applied. In this section, we are going to estimate the parameters of the INARZOIG(1) model using conditional maximum likelihood (CML) and conditional least squares (CLS) estimation methods. For simplicity of notations, we can write the likelihood function through the joint probability function (10) as where P X 1 is the pmf of X 1 and P k ðX iþ1 jX i Þ is the conditional pmf. To overcome the complexity of the marginal distribution, a simple approach is to find the conditional pmf conditioned on the first observation X 1 , essentially ignoring the dependency on the initial value and obtain the conditional maximum likelihood (CML) estimate given X 1 as an estimate of k by maximizing the conditional loglikelihood. over k. Since there is no closed form for the CML estimates, these estimates are achieved using numerical methods. The asymptotic properties of the CML estimators follow from Freeland and McCabe (2004) . In this subsection, we describe the estimation of the unknown parameters of the INARZOIG(1) process using the two-step CLS estimation method proposed by Karlsen and Tjøstheim (1988) which is conducted by the following two steps. Step 1 , then the conditional least square (CLS) estimators of the parameters a and l are obtained by minimizing the function where g 1 ðb 1 ; X tÀ1 Þ ¼ EðX t jX tÀ1 Þ ¼ aX tÀ1 þ l , and are given bŷ a cls ¼ ðn À 1Þ P n t¼2 X t X tÀ1 À P n t¼2 X t P n t¼2 X tÀ1 ðn À 1Þ P n t¼2 X 2 tÀ1 À ð P n t¼2 X tÀ1 Þ 2 ; ð20Þ and l ;cls ¼ P n t¼2 ðX t Àâ cls X tÀ1 Þ n À 1 : Step 2 where VarðX t jX tÀ1 Þ ¼â cls ð1 Àâ cls ÞX tÀ1 þl ;cls Àl 2 ;cls þ 2ðl ;cls À/ 1 Þ 2 ð1À/ 0 À/ 1 Þ : Therefore, the CLS criterion function for b 2 can be written as The CLS estimatorb 2;cls ¼ ð/ 0;cls ;/ 1;cls Þ 0 of b 2 are obtained by numerical solution of (22). Step 3 Based on the results from Steps 1 and 2, the estimator h cls of h can be obtained by considering the following equation: l ;cls ¼/ 1;cls þ ð1 À/ 0;cls À/ 1;cls Þh; Therefore, the resulting CLS estimators is ðâ cls ;ĥ cls ;/ 0;cls ;/ 1;cls Þ 0 . To study the asymptotic behavior of the estimators, we make the following assumptions, (C1) X t is a stationary and ergodic process. (C2) EðX 4 t Þ\1: Proposition 3 Under the assumptions (C1) and (C2), the CLS estimatorb 1;cls ¼ ðâ cls ;l ;cls Þ 0 is strongly consistent and asymptotically normal, Based on Propositions 3 and 4 and Theorem 3.2 in Nicholls and Quinn (1982) , we have the following proposition. Proposition 5 Under the assumptions (C1) and (C2), the CLS estimatorb cls ¼ ðb 1;cls ;b 2;cls Þ 0 is strongly consistent and asymptotically normal, Based on the above proposition, we state the strong consistency and asymptotic normality ofk cls in the following proposition. Proposition 6 Under the assumptions (C1) and (C2), the CLS estimatork cls is strongly consistent and asymptotically normal, ffiffi ffi n pk cls À k 0 À! L Nð0; DXD 0 Þ ; and / 2 ¼ 1 À / 0 À / 1 . The brief proofs of Propositions 3-6 are given in Appendix. This part of the paper includes two subsections. In the first part, the performance of the estimation methods, which are presented in the previous section, is evaluated through a simulation study. Moreover, the empirical distribution of the simulated sample path in points zero and one are compared with the results of the Eqs. (15) and (16). To ensure the practical performance of the proposed process, the second part is focused on two real-life application series: the number of daily infected cases due to COVID-19 in Barbados, available in https://ourworldindata.org/ covid-cases and the Poliomyelitis data from Zeger (1988) and Maiti et al. (2018) . To conduct the simulation study, we need to generate a random sample from the INARZOIG(1) process. Based on the second stochastic representation in Zhang et al. (2016) , we first generate a random sample 1 ; . . .; n from ZOIGð/ 0 ; / 1 ; h 1þh Þ and then simulate fX t g n t¼1 from INAR-ZOIG(1) model. The simulation comprised the following steps: Step 1 Generate Z 1 ; . . .; Z n form Bernoullið1 À /Þ, Step 2. From Bernoulli(p) generate g 1 ; . . .; g n , Step 3. From Geð h 1þh Þ generate T 1 ; . . .; T n , Step 4. Use i ¼ ð1 À Z i Þg i þ Z i T i for i = 1, ..., n, generate 1 ; . . .; n , where / 0 ¼ /ð1 À pÞ and / 1 ¼ /p. According to the above algorithm, we generate a random sample (with n = 1000) from the INARZOIG(1) process with / 0 ¼ 0:4, / 1 ¼ 0:2, h ¼ 1; 5 and a ¼ 0:1; 0:5. The sample path and barplot of the marginal distribution of this simulated count time series is presented in Fig. 1 . As can be seen from Fig. 1 , for all values of a and larger values of h, the sample path tends to have larger values. But for all values of a and smaller values of h, the process has a strong tendency to return to zero or one values with less mean and variance which is clear from parts (b) and (d) of Proposition 1. In addition, Fig. 1 shows that the number of zeros and ones increases by decreasing the values of h. To compare the performance of the CML and the CLS estimators, we simulate the data for n = 50, 100, 200, 500, 1000, a ¼ 0:2, / 0 ¼ 0:1; 0:4, / 1 ¼ 0:1 and h ¼ 1; 3 with 10,000 replications. Mean and mean squared error (MSE) of the estimates are computed to evaluate the estimates. The function ''nlminb'' in ''R'' is used to obtaining these estimates. The results of the simulation are given in Tables 1 and 2. These tables show that the CML estimate is Fig. 1 Barplots of limiting marginal distribution and sample paths of the simulated INARZOIG(1) process for / 0 ¼ 0:4, / 1 ¼ 0:2, h ¼ 1; 5 and a ¼ 0:1; 0:5 performed better than CLS estimate because of smaller MSE (except for a few cases). In Table 3 , we compare the empirical distribution of the simulated sample path with Eqs. (15) and (16) In this subsection, using two real-life data sets, we show the applicability of the INARZOIG(1). In the first example, we use the data of new infected cases in Barbados from March 17, 2020, until January 02, 2021, and (1) process with one-inflated geometric-distributed innovations) for these data sets. We use the AIC (Akaike information criterion), loglik (log-likelihood function), AICc (corrected version of the AIC), BIC (Bayesian information criterion), PMAE(h) (predicted mean absolute error), and the PTP(h) (percentage of true prediction ) criteria where the last two criteria are the h-step ahead forecasting accuracy measures. To calculate the last two measures, we divide the data into two parts. The first part is used to fit the considered models, and the second part which is the last 20 observations is used to compute theX tþh and then the PMAE(h) and PTP(h) are computed for h = 1. In this subsection, using a real data set, we show the applicability of the INARZOIG(1). We use the data of new infected cases in Barbados from the 17th of March 2020 until the 2nd of January 2021. This data set has 292 observations for which 148 (51%) of observations are zero and 64 (22%) of observations are one, and the other 80 (27%) of observations had infected cases more than one. The mean and variance of observations are 1.35 and 5.60, respectively, and hence, the Fisher index of them is given as 4.15 and it shows that the data are overdispersed. The barplot, series plot, ACF and PACF are plotted in Figs. 2 and 3, respectively. It is noted that the PACF yields The LRT statistics is equal to 3.937 and the critical value at level 0.05 is equal to 3.841. Hence, we can conclude that the null hypothesis rejects and the zero-and-one-inflated distribution is more suitable than zero-inflated model for this data set. Also, we calculated two forecasting accuracy measures; however, they are the same for all models and PMAE is equal to 4.45 and PTP is equal to 20. The last figure shows the daily new infected cases of COVID-19 in Barbados and their predicted values using INARZOIG(1). As can be seen, the predicted values are closed to the original data, which indicates the good performance of the proposed fitted model in the sense of forecasting (Fig. 4) . In this subsection, we considered the Poliomyelitis data which are the monthly cases in the USA from 1970 to 1983. These data were analyzed by Zeger (1988) for the first time. This data set has 168 observations for which 64 (38%) of observations are zero and 55 (32%) of observations are one, and the other 49 (30%) of observations had monthly cases more than one. The mean and variance of observations are 1.33 and 3.50, respectively, and hence, the Fisher index of them is given as 2.63. The value of the Fisher index indicates that the data are overdispersed. Recently, Maiti et al. (2018) considered these data and fitted most of the existing INAR(1) models including Poisson INAR(1), overdispersed models such as geometric INAR(1) and compound Poisson INAR(1), zero-inflated models like zero-inflated and zeromodified INAR(1) and their proposed sub-model, the onemodified geometric INAR(1)(OMGINAR(1)). Using some goodness of fit criteria and 1-step ahead forecasting accuracy measures, they showed that OMGINAR(1) had the best fit among all considered models. Now, we analyze the data further. First, we plot the barplot and the series plot in Fig. 5 . These figures and the frequencies of the observed zeros and ones show the extra number of zeros and ones. This fact and the overdispersion of the data, motivated us to fit the INARZOIG(1) model into this data set. The ACF and PACF of the data are plotted in Fig. 6 . Based on the conclusions of Maiti et al. (2018) about the considered data set, we compare our model with OMGI-NAR(1) and used the reported criteria in that paper for this model. Also, we considered the ZOIPLINAR(1), introduced by Mohammadi et al. (2021) , as another alternative to compare with. We use the Loglik, AIC, AICc, and BIC criteria, and the results are reported in Table 6 . As can be seen, the INARZOIG(1) model has the largest Loglik and smallest AIC, AICc, but the value of the BIC of the OMGINAR(1) is the smallest BIC. Nevertheless, based on Raftery (1995) , since the difference between these values is less than 2, it is not significant and the other criteria show that the INARZOIG(1) is more suitable for this data set. We can conclude that our introduced model has the best fit on this data set; however, the forecasting accuracy measures are the same when PMAE is equal to 0.95 and PTP is equal to 45 for all considered models. Moreover, from Fig. 7 that shows the plot of the Poliomyelitis data and their predicted values, it can be seen that the predicted values are found to be almost close to the real data. This figure indicates the good performance of the INAR-ZOIG(1) in the sense of forecasting, too. This paper analyzes the zero-and-one-inflated time series using a flexible INAR(1) model based on the zero-and-oneinflated geometric-distributed innovation. The main properties of this novel INAR(1) process are established, and its model parameters are estimated via the CML and CLS approaches. The performance of the two estimation techniques is assessed through some Monte Carlo experiments wherein both approaches are shown to provide consistent estimates, but with the CML approach providing lesser biased estimates. Furthermore, the INARZOIG(1) model is applied to analyze the COVID-19 series from Barbados, which is found to consist of a more frequent number of zeros and ones. Also, using the portmanteau test we indicate that order 1 is suitable for this data set. As a next example, this model is applied to another real data set which is the monthly cases in the USA from 1970 to 1983 that was analyzed by Zeger (1988) for the first time. Under the data applications, the INARZOIG(1) model is shown to provide better fitting criteria than the existing competing models. Evidently, the statistical performance of the INARZOIG(1) depends on the nature of the data as well, but overall, the INARZOIG(1) model has a worthy contribution to the class of INAR models. 0:1381ð0:0393Þ h ¼ First-order integer-valued autoregressive (INAR(1)) process First-order integer-valued autoregressive (INAR(1)) process: distributional and regression properties Some autoregressive moving average processes with generalized Poisson marginal distributions Forecasting overdispersed INAR(1) count time series with negative binomial marginal Zero-truncated Poisson integervalued AR(1) model A zero inflated geometric INAR(1) process with random coefficient Zero-modified geometric INAR(1) process for modelling count time series with deflation or inflation of zeros First order non-negative integer valued autoregressive processes with power series innovations A new geometric INAR(1) process based on counting series with deflation or inflation of zeros Extended Poisson INAR(1) processes with equidispersion, underdispersion and overdispersion Branching processes with immigration and integer-valued time series The integer-valued autoregressive (INAR(p)) model Analysis of low count time series data by Poisson autoregression Portmanteau tests for generalized integer-valued autoregressive time series models A new first-order integer-valued autoregressive model with Bell innovations Integer valued AR(1) with geometric innovations First-order integer valued AR processes with zero inflated Poisson innovations Consistent estimates for the NEAR(2) and NLAR(2) time series models On conditional least squares estimation for stochastic processes An INAR(1) model with Poisson-Lindley innovations First-order mixed integer-valued autoregressive processes with zero-inflated generalized power series innovations A new extension of thinning-based integervalued autoregressive models for count data Time series of zero inflated counts and their coherent forecasting Modelling of low count heavy tailed time series data consisting large number of zeros and ones The family of the bivariate integer-valued autoregressive process (BINAR(1)) with Poisson-Lindley (PL) innovations Studying the trend of the novel coronavirus series in Mauritius and its implications Some simple models for discrete variate time series Autoregressive moving-average processes with negative-binomial and geometric marginal distributions Some ARMA models for dependent sequences of Poisson counts Zeroand-one inflated Poisson-Lindley INAR(1) process for modelling count time series with extra zeros and ones Poisson-Lindley INAR(1) model with applications Mood AM (1940) The distribution theory of runs Random coefficient autoregressive models: an introduction Some properties of multivariate INAR(1) processes Modeling time series of count with excess zeros and ones based on INAR(1) model with zero-one inflated Poisson innovations Bayesian model selection in social research A new geometric firstorder integer-valued autoregressive (NGINAR(1)) process Estimation in an integervalued autoregressive process with negative binomial marginals (NBINAR(1)) Compound Poisson INAR(1) processes: stochastic properties and testing for overdispersion New York Sharafi M, Sajjadnia Z, Zamani A (2020) A first-order integer-valued autoregressive process with zero-modified Poisson-Lindley distributed innovations Discrete analogues of self-decomposability and stability A case study of MCB and SBMH stock transaction using a novel BINMA(1) with non-stationary NB correlated innovations Estimation in nonlinear time series models Thinning operations for modeling time series of counts-a survey Stationary count time series models Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued autoregressive processes A regression model for time series of counts Properties of the zero-and-one inflated Poisson distribution and likelihood-based inference methods Proof of Proposition 2Proof of Propositions 3 and 4 These two Propositions are similar to Theorem 1 and 2 in Yang et al. (2019) , which can be proved by verifying the regularity conditions of Theorems 3.1 and 3.2 in Klimko and Nelson (1978) . For instance, in the proof of Proposition 3, the partial deriva-have finite fourth moments in Klimko and Nelson (1978) , u m ða 0 Þ in Klimko and Nelson (1978) is corresponded to q 11 ðb 1;0 Þ in Step 1. Hence, Proposition 3 can be regarded as a direct conclusion of Theorem 3.2.Proof of Proposition 5 The proof of Proposition 5 is similar to Proposition 4 in Liu and Zhu (2021), we omit the details here and refer the reader to Liu and Zhu (2021) .Proof of Proposition 6 This is an application of the dmethod. For completeness, we refer the reader to Theorem A on p. 122 of Serfling (1980) for a proof.Author Contributions All authors made substantial contributions to the conception and design and finally approved the manuscript. Material preparation, data collection and analysis were performed by ZM, MS, ZS and NMK. All authors contributed to the drafting and preparation of the manuscript and the study protocol and provided the approval of the final version of the manuscript.Funding There is no funding for this manuscript.Availability of data and material (data transparency) The websites that contain the data are presented in the main manuscript.