key: cord-0163231-vk86qna0
authors: Armillotta, Mirko; Fokianos, Konstantinos
title: Poisson Network Autoregression
date: 2021-04-13
journal: nan
DOI: nan
sha: b1209d0ab6506f20f1a0b93190197f96a83a3d5f
doc_id: 163231
cord_uid: vk86qna0

We consider network autoregressive models for count data with a non-random neighborhood structure. The main methodological contribution is the development of conditions that guarantee stability and valid statistical inference for such models. We consider both cases of fixed and increasing network dimension and we show that quasi-likelihood inference provides consistent and asymptotically normally distributed estimators. The work is complemented by simulation results and a data example.

The vast availability of integer-valued data, emerging from several real world applications, has motivated the growth of a large literature for modeling and inference for count time series processes.

For comprehensive surveys, see Kedem and Fokianos (2002) , and Weiß (2018) .

Early contributions to the development of count time series models were the Integer Autoregressive models (INAR) Al-Osh and Alzaid (1987) ; Alzaid and Al-Osh (1990) and observation (Zeger and Liang, 1986) or parameter driven models (Zeger, 1988) . The latter classification, due to Cox (1981) , will be particular useful as we will be developing theory for Poisson observation-driven models. In tions (Genest and Nešlehová, 2007, pp. 507-508) . They propose a plausible data generating process which preserves, marginally, Poisson processes properties. Further details are given by the recent review of Fokianos (2022) .

The aim of this contribution is to show that multivariate observation-driven count time series models are suitable for Modelling time-varying network data. Such data is increasingly available in many scientific areas (social networks, epidemics, etc.) . Measuring the impact of a network structure to a multivariate time series process has attracted considerable attention over the last years. In an unpublished work, Knight et al. (2016) defined multivariate continuous time series coupled with a network structure as network time series. Furthermore these authors proposed methodology for the analysis of such data. Such approach has been originally proposed in the context of spatio-temporal data analysis, referred to as Space-Time Autoregressive Moving Average (STARMA) models; Cliff and Ord (1975) and Martin and Oeppen (1975) , among many others.

Indeed, a wide variety of available spatial streaming data related to physical phenomena can fit this framework. In general, any stream of data for a sample of units whose relations can be modelled through an adjacency matrix (neighborhood structure), adhere to statistical techniques reviewed in this work. Zhu et al. (2017) developed a similar model, called Network Autoregressive model (NAR) , which is an autoregressive model for continuous network data and established associated least squares inference under two asymptotic regimes (a) with increasing time sample size T → ∞ and fixed network dimension N and (b) with both N, T increasing. More precisely, it is assumed that N → ∞ and T N → ∞, i.e. the temporal sample size is assumed to depend on N . The regime (a) corresponds to standard asymptotic inference in time series analysis. However, in network analysis it is important to understand the behavior of the process when the network's dimension grows. This is a relevant problem in fields where typically the network is large, see, for example, social networks in Wasserman et al. (1994) . From the empirical point of view, it is crucial to have conditions for large network structures, such that time series inference is still valid; those problems motivate study of asymptotics (b). Significant extension of this work to network quantile autoregressive models has been recently reported by Zhu et al. (2019) . Some other extensions of the NAR model include the grouped least squares estimation (Zhu and Pan, 2020) and a network version of the GARCH model, see Zhou et al. (2020) for the case of T → ∞ and fixed network dimension N . Under the standard asymptotic regime (a), related work was also developed by Knight et al. (2020) who specified a Generalized Network Autoregressive model (GNAR) for continuous random variables, which takes into account different layers of relationships within neighbors of the network. Moreover, the same authors provide R software (package GNAR) for fitting such models.

Following the discussion in the concluding remarks of Zhu et al. (2017 Zhu et al. ( , p. 1116 , an extension of the NAR model to accommodate discrete responses would be of crucial importance. Indeed, integer-valued responses are commonly encountered in real applications and are strongly connected to network data. For example, several data of interest in social network analysis correspond to integer-valued responses (number of posts, number of likes, counts of digit employed in comments, etc). Another typical field of application for count variable is the number of cases in epidemic models for studying the spread of infection diseases in a population; this is even more important in the current COVID-19 pandemic outbreak. Recently, an application of this type which employs a model similar to the NAR with count data has been presented in Bracher and Held (2020) . Therefore, the extension of the NAR model to multivariate count time series is an important theoretical and methodological contribution which is not covered by the existing literature, to the best of our knowledge.

The main goal of this work is to fill this gap by specifying linear and log-linear Poisson network autoregressions (PNAR) for count processes and by studying in detail the two related types of asymptotic inference discussed above. Moreover, the development of all network time series models discussed so far relies strongly on the assumption that t innovations are Independent and Identically Distributed (IID). Such a condition might not be realistic in many applications. We overcome this limitation by employing the notion of L p -near epoch dependence (NED), see Andrews (1988) , Pötscher and Prucha (1997) , and the related concept of α-mixing (Rosenblatt, 1956; Doukhan, 1994) . These notions allow relaxation of the independence assumption as they provide some guarantee of asymptotic independence over time. An elaborate and flexible dependence structure among variables, over time and over the nodes composing the network, is available for all models we consider due to the definition of a full covariance matrix, where the dependence among variables is captured by the copula construction introduced in Fokianos et al. (2020) . For an alternative approach to Modelling multivariate counts in continuous time see Veraart (2019) , Eyjolfsson and Tjøstheim (2021) , and Fang et al. (2021) for a network model employing Hawkes processes which are related to the linear and log-linear model we will be studying. Indeed those models are obtained after suitable discretization of the corresponding continuous time process. However our proposal imposes a specific data generating process, does not assume homogeneity across the network and the condition required for obtaining good large sample properties of the QMLE are quite different than those assumed by Fang et al. (2021) .

For the continuous-valued case, Zhu et al. (2017) employed ordinary least square (OLS) estimation combined with specific properties imposed on the adjacency matrix for the estimation of unknown model parameters. However, this method is not applicable to general time series models.

In the case we study, estimation is carried out by using quasi-likelihood methods; see Heyde (1997) , for example. When the network dimension N is fixed and T → ∞, standard results already available for Quasi Maximum Likelihood Estimation (QMLE) of Poisson stationary time series, as developed by Fokianos et al. (2020) , carry over to the case of PNAR models. However, the asymptotic properties of the estimators rely on the convergence of sample means to the associated expectations due to the ergodicity of a stationary random process {Y t : t ∈ Z} (or a perturbed version of it).

The stationarity of an N -dimensional time series, with N → ∞, is still an open problem and it is not clear how it can be introduced. As a consequence, standard time series results concerning convergence of sample means do not carry over to the increasing dimension case. In the present contribution, this problem is bypassed by providing an alternative proof, based on the laws of large numbers for L p -NED processes of Andrews (1988) . Our method employs a working definition of stationarity Zhu et al. (2017, Def. 1) for processes of increasing dimension.

The paper is organized as follows: Section 2 discusses the PNAR(p) model specification for the linear and the log-linear case, with lag order p, and the related stability properties. In Section 3, quasi-likelihood inference is established, showing consistency and asymptotic normality of the QMLE for the two types of asymptotics (a)-(b). Section 4 discusses the results of a simulation study and an application on real data. The paper concludes with an Appendix containing all the proofs of the main results and the specification of the first two moments for the linear PNAR model.

Notation: We denote |x| r = ( p j=1 |x j | r ) 1/r the l r -norm of a p-dimensional vector x. If r = ∞, |x| ∞ = max 1≤j≤p |x j |. Let X r = ( p j=1 E(|X j | r )) 1/r the L r -norm for a random vector X. For a q × p matrix A = (a ij ), i = 1, . . . , q, j = 1, . . . , p, denotes the generalized matrix norm |||A||| r = max |x| r =1 |Ax| r . If r = 1, then |||A||| 1 = max 1≤j≤p q i=1 |a ij |. If r = 2, |||A||| 2 = ρ 1/2 (A T A), where ρ(·) is the spectral radius. If r = ∞, |||A||| ∞ = max 1≤i≤q p j=1 |a ij |. If q = p, then these norms are matrix norms. Define |x| r v = (|x 1 | r , . . . , |x p | r ) , |A| v = (|a i,j |) (i,j) , X r,v = (E 1/r |X 1 | r , . . . , E 1/r |X p | r ) and a partial order relation on x, y ∈ R p such that x y means

x i ≤ y i for i = 1, . . . , p. For a d-dimensional vector z, with d → ∞, set the following compact notation max 1≤i<∞ z i = max i≥1 z i . The notations C r and D r denote a constant which depend on r, where r ∈ N. In particular C denotes a generic constant. Finally, throughout the paper the notation {N, T N } → ∞ will be used as a shorthand for N → ∞ and T N → ∞, where the temporal size T is assumed to depend on the network dimension N .

We consider a network with N nodes (network size) and index i = 1, . . . N . The structure of the network is completely described by the adjacency matrix A = (a ij ) ∈ R N ×N , i.e. a ij = 1 provided that there exists a directed edge from i to j, i → j (e.g. user i follows j on Twitter), and a ij = 0 otherwise. However, undirected graphs are allowed (i ↔ j). The structure of the network is assumed non-random. Self-relationships are not allowed, i.e. a ii = 0 for any i = 1, . . . , N ; this is a typical assumption, and it is reasonable for various real situations, e.g. social networks, where an user cannot follow himself. For details about the definition of social networks see Wasserman et al. (1994) , Kolaczyk and Csárdi (2014) . Let us define a certain count variable Y i,t ∈ R for the node i at time t. We want to assess the effect of the network structure on the count variable {Y i,t } for i = 1, . . . , N over time t = 1, . . . , T .

In this section, we study the properties of linear and log-linear models. We initiate this study by considering a simple, yet illuminating, case of a linear model of order one and then we consider the more general case of p'th order model. Finally, we discuss log-linear models. In what follows, we denote by {Y t = (Y i,t , i = 1, 2 . . . N, t = 0, 1, 2 . . . , T )} an N -dimensional vector of count time series with {λ t = (λ i,t , i = 1, 2 . . . N, t = 1, 2 . . . , T )} be the corresponding N -dimensional intensity process vector. Define by F t = σ(Y s : s ≤ t). Based on the specification of the model, we assume that λ t = E(Y t |F t−1 ).

A linear count network model of order 1, is given by

where n i = j =i a ij is the out-degree, i.e the total number of nodes which i has an edge with. From the left hand side equation of (1), we observe that the process Y i,t is assumed to be marginally Poisson. We call (1) linear Poisson network autoregression of order 1, abbreviated by PNAR(1).

Model (1) postulates that, for every single node i, the marginal conditional mean of the process is regressed on the past count of the variable itself for i and the average count of the other nodes j = i which have a connection with i. This model assumes that only the nodes which are directly followed by the focal node i possibly have an impact on the mean process of counts. It is a reasonable assumption in many applications; for example, in a social network the activity of node k, which satisfies a ik = 0, does not affect node i. The parameter β 1 is called network effect, as it measures the average impact of node i's connections n −1 i N j=1 a ij Y j,t−1 . The coefficient β 2 is called momentum effect because it provides a weight for the impact of past count Y i,t−1 . This interpretation is in line with the Gaussian network vector autoregression (NAR) as discussed by Zhu et al. (2017) for the case of continuous variables.

Equation (1) does not include information about the joint dependence structure of the PNAR(1) model. It is then convenient to rewrite (1) in vectorial form, following Fokianos et al. (2020) ,

where β 0 = β 0 1 N ∈ R N , with 1 = (1, 1, . . . , 1) T ∈ R N , and the matrix G = β 1 W + β 2 I N ,

where W = diag n −1 1 , . . . , n −1 N A is the row-normalized adjacency matrix, with A = (a ij ), so w i = (a ij /n i , j = 1, . . . , N ) T ∈ R N is the i-th row vector of the matrix W, satisfying |||W||| ∞ = 1, and I N is the N × N identity matrix. In general, the weights w i can be chosen arbitrarily as long as |||W||| ∞ = 1 is satisfied. Finally, {N t } is a sequence of independent N -variate copula-Poisson processes. By this we mean that N t (λ t ) is a sequence of N -dimensional IID Poisson count processes, with intensity 1, counting the number of events in the interval of time [0, λ 1,t ] × · · · × [0, λ N,t ], and whose structure of dependence is modelled through a copula construction C(. . . ) on their associated exponential waiting times random variables. More precisely, consider a set of values (β 0 , β 1 , β 2 ) T and a starting vector λ 0 = (λ 1,0 , . . . , λ N,0 ) T , 1. Let U l = (U 1,l , . . . , U N,l ), for l = 1, . . . , K a sample from a N -dimensional copula C(u 1 , . . . , u N ),

where U i,l follows a Uniform(0,1) distribution, for i = 1, . . . , N .

3. If X i,1 > 1, then Y i,0 = 0, otherwise Y i,0 = max k ∈ [1, K] : k l=1 X i,l ≤ 1 , by taking K large enough. Then, Y i,0 ∼ P oisson(λ i,0 ), for i = 1, . . . , N . So, Y 0 = (Y 1,0 , . . . , Y N,0 ) is a set of marginal Poisson processes with mean λ 0 . 4. By using the model (2), λ 1 is obtained. 5. Return back to step 1 to obtain Y 1 , and so on.

In practical applications the sample size K should be a large value, e.g. K = 1000; its value clearly depends, in general, on the magnitude of observed data. Moreover, the copula construction C(. . . ) will depend on one or more unknown parameters, say ρ, which capture the contemporaneous correlation among the variables.

The development of a multivariate count time series model would be based on specification of a joint distribution, so that the standard likelihood inference and testing procedures can be developed. Although several alternatives have been proposed in the literature, see the review in Fokianos (2022, Sec. 2) , the choice of a suitable multivariate version of the Poisson probability mass function (p.m.f) is a challenging problem. In fact, multivariate Poisson-type p.m.f have usually complicated closed form and the associated likelihood inference is theoretically and computationally cumbersome. Furthermore, in many cases, the available multivariate Poisson-type p.m.f. implicitly imply restricted models, which are of limited use in applications (e.g. covariances always positive, constant pairwise correlations). For these reasons, in this work the joint distribution of the vector {Y t } is constructed by following the copula approach described above. The proposed data generating process ensures that all marginal distributions of Y i,t are univariate Poisson, conditionally to the past, as described in (1), while it introduces an arbitrary dependence among them in a flexible and general way by the copula construction. For a comprehensive discussion on the choice of a multivariate count distribution and the comparison between the alternatives proposed in the literature, the interested reader can refer to Inouye et al. (2017) and Fokianos (2022) . Further results regarding the empirical properties of model (2) are discussed in Section S-1 of the Supplementary Material (henceforth SM).

More generally, we introduce and study an extension of model (1) by allowing Y i,t to depend on the last p lagged values. We call this the linear Poisson NAR(p) model and its defined analogously to (1) but with

where β 0 , β 1h , β 2h ≥ 0 for all h = 1 . . . , p. If p = 1, set β 11 = β 1 , β 22 = β 2 to obtain (1). The joint distribution of the vector Y t is defined by means of the copula construction discussed in Sec. 2.1.

Without loss of generality, we can set coefficients equal to zero if the parameter order is different in both terms of (3). Its is easy to see that (3) can be rewritten as

where G h = β 1h W + β 2h I N for h = 1, . . . , p by recalling that W = diag n −1 1 , . . . , n −1 N A. We have the following result which gives sharp verifiable conditions.

Proposition 1. Consider model (4), with fixed N . Suppose that ρ( p h=1 G h ) < 1. Then, the process {Y t , t ∈ Z} is stationary and ergodic with E |Y t | r 1 < ∞ for any r ≥ 1.

The result follows from Debaly and Truquet (2021, Thm. 2) . Similar results have been recently proved by Fokianos et al. (2020) when there exists a feedback process in the model. Following these authors, we obtain the same results of Proposition 1 but under stronger conditions. For example, when p = 1, we will need to assume either |||G||| 1 < 1 or |||G||| 2 < 1 to obtain identical conclusions.

Results about the first and second order properties of model (3) are given in Appendix A.1; see also Fokianos et al. (2020, Prop. 3 .2).

Prop. 1 establishes the existence of the moments of the count process with fixed N , but this property is not guaranteed in the case that N → ∞. The following results show that when N tends to infinity, the conclusions of Prop. 1 are still true.

Proposition 2. Consider model (4) and p h=1 (β 1h + β 2h ) < 1. Then, max i≥1 E |Y i,t | r ≤ C r < ∞, for any r ∈ N.

In order to investigate the stability results of the process Y t ∈ N N when the network size is diverging (N → ∞) we employ the working definition of stationarity for increasing dimensional processes described in Zhu et al. (2017, Def. 1) . The following result holds.

Theorem 1. Consider model (4). Assume p h=1 (β 1h + β 2h ) < 1 and N → ∞. Then, there exists a unique strictly stationary solution {Y t ∈ N N , t ∈ Z} to the linear PNAR model, with

Thm. 1 is a counterpart of Zhu et al. (2017, Thm.1) for continuous values NAR model but it assures all the moments of the integer-valued process are uniformly bounded. Although stronger than the requirements in Prop. 1, the condition p h=1 (β 1h + β 2h ) < 1 allows to prove stationarity for increasing network size N and the existence of moments of the process; moreover, it is more natural than ρ( p h=1 G h ) < 1, and it complements the existing work for continuous valued models; Zhu et al. (2017) . In addition, the former will be also required for the inferential results of Section 3 below.

It is worth pointing out that the copula construction is not used in the proof of Theorem 1 (see also Theorem 2 for log-linear model). However, it is used in Section 4.1 where we report a simulation study. It is interesting though, that even under this setup, stability conditions are independent of the correlation structure of innovations; this is similar to the case of multivariate ARMA models.

Remark 1. A count GNAR(p) extension similar to the model introduced by Knight et al. (2020, eq . 1), for the standard asymptotic regime (T → ∞), in the context of continuous-valued random variables, is included in the framework we consider. Such model adds an average neighbor impact for several stages of connections between the nodes of a given network. That is,

. . , N } : i → j} the set of neighbors of the node i. (So, for example, N (2) (i) describes the neighbors of the neighbors of the node i, and so on.) In this case, the row-normalized

, card(·) denotes the cardinality of a set and I(·) is the indicator function. Several M types of edges are allowed in the network. The Poisson GNAR(p) has the following formulation.

where s h is the maximum stage of neighbor dependence for the time lag h. Model (5) 

Recall model (1). The network effect β 1 of model (1) is typically expected to be positive, see Chen et al. (2013) , and the impact of Y i,t−1 is positive, as well. Hence, positive constraints on the parameters are theoretically justifiable as well as practically sound. However, in order to allow a natural link to the GLM theory, McCullagh and Nelder (1989) , and allowing the possibility to insert covariates as well as coefficients which take values on the entire real line, we propose the following log-linear model, see Fokianos and Tjøstheim (2011) :

where ν i,t = log(λ i,t ) for every i = 1, . . . , N . No parameters constraints are required for model (6) since ν i,t ∈ R. The interpretation of all parameters remains the same as in the case of (1). Again, the model can be rewritten in vectorial form, as in the case of model (2)

where N t (ν t ) is a sequence of independent N -dimensional copula-Poisson processes which counts the number of events in the interval of time [0, exp(ν 1,t )] × · · · × [0, exp(ν N,t )], and ν t ≡ log(λ t ),

componentwise. Furthermore, it can be useful rewriting the model as follow.

where ψ t = log(1 N + Y t ) − ν t . By Lemma A.1 in Fokianos and Tjøstheim (2011) E(ψ t |F t−1 ) → 0 as ν t → ∞, so ψ t is approximately martingale difference sequence (MDS). This means that the formulation of first two moments established for the linear model in Appendix A.1 hold, approximately, for log(1 N + Y t ). We discuss empirical properties of the count process Y t of model (6) in Section S-2 of the SM. Moreover, ξ t = Y t − exp(ν t ) is a MDS. We define the log-linear PNAR(p) by

using the same notation as before. The interpretation of this model is the same as of the linear model. Furthermore,

where G h = β 1h W + β 2h I N for h = 1, . . . , p. The following results are complementing Prop. 1-2

and Theorem 1 proved for the case of log-linear model.

Proposition 3. Consider model (9), with fixed N . Suppose that ρ( p h=1 |G h | v ) < 1. Then the process {Y t , t ∈ Z} is stationary and ergodic with

The result follows from Debaly and Truquet (2019, Thm. 5) . Analogously to the linear model, we need to show the uniform boundedness of moments of the process and the stationarity of the model with increasing dimension. Since the noise ψ t is approximately MDS, the following result is proved by employing approximate arguments.

Proposition 4. Consider model (9) and | p h=1

Analogously to Thm 1, a strict stationarity result for network of increasing order is given for the log-linear PNAR model (9).

Theorem 2. Consider model (9). Assume p h=1 (|β 1h | + |β 2h |) < 1 and N → ∞. Then, there exists a unique strictly stationary solution {Y t ∈ N N , t ∈ Z} to the log-linear PNAR model, with max i≥1 E |Y i,t | r ≤ C r < ∞, and max i≥1 E[exp(r |ν i,t |)] ≤ D r < ∞, for all r ≥ 1.

Remark 2. Although, for simplicity, model (4) has been specified without covariates, time-invariant positive covariates Z ∈ R d + are allowed to be included without affecting the results of the present contribution, under suitable moments existence assumptions. These may be useful, for example, in order to consider available node-specific characteristics. Moreover, the log-linear version (9) ensures the inclusion of covariates whose values belong to R d .

Remark 3. Analogous arguments made in Remark 1 for the linear model case hold true for the log-linear model (8) and a log-linear GNAR(p) can be advanced.

The aim of this section is to establish inference for the unknown vector of parameters of models (4),(9), denoted by θ = (β 0 , β 11 , . . . , β 1p , β 21 , . . . , β 2p ) T ∈ R m , where m = 2p + 1. We approach the estimation problem by using the theory of estimating functions; see Basawa and Prakasa Rao (1980) , Zeger and Liang (1986) and Heyde (1997) , among others. In particular, developing proofs of consistency and asymptotic normality of the Quasi Maximum Likelihood Estimation (QMLE) is the main goal of the present section. Since, in such framework, W is a non-random sequence of matrices indexed by N , the specification of the asymptotic properties of the estimator deals with to diverging indexes, N → ∞ and T → ∞, allowing to establish a double-dimensional-type of converge, when both the temporal size and the network dimension grow together.

Define the conditional quasi-log-likelihood function for the vector of unknown parameters θ:

which is the log-likelihood one would obtain if time series modelled in (4), or (9), would be contemporaneously independent. This simplifies computations but guarantees consistency and asymptotic normality of the resulting estimator. Although the joint copula structure C(. . . , ρ) and the set of parameters ρ, are not included in the maximization of the "working" log-likelihood (10), this does not mean that the QMLE is computed under the assumption of independence; this is easily detected by the shape of the information matrix (15) below, which depends on the true conditional covariance matrix of the process Y t . Douc et al. (2017) , among others, established inference theory for QMLE for observation driven models, under the standard asymptotic regime (T → ∞). Assuming that there exists a true vector of parameter, say θ 0 , such that the mean model specification (4) (or equivalently (9)) is correct, regardless the true data generating process, then we obtain a consistent and asymptotically normal estimator by maximizing the quasi-log-likelihood (10). Denote byθ := arg max θ l N T (θ), the QMLE for θ. The score function for the linear model is given by

where ∂λ t (θ)

is a N × m matrix and D t (θ) is the N × N diagonal matrix with diagonal elements equal to λ i,t (θ) for i = 1, . . . , N . The Hessian matrix is given by

with C t (θ) = diag Y 1,t /λ 2 1,t (θ) . . . Y N,t /λ 2 N,t (θ) and the conditional information matrix is

where Σ t (θ) = E(ξ t ξ T t |F t−1 ) denotes the true conditional covariance matrix of the vector Y t and recalling ξ t ≡ Y t − λ t . Expectation is taken with respect to the stationary distribution of {Y t }.

Moreover, the theoretical counterpart of the Hessian and information matrices, respectively, are the following.

Similarly for the log-linear PNAR model, we have that the score function is given by:

where

is a N × m matrix, and

where D t (θ) is the N × N diagonal matrix with diagonal elements equal to exp(ν i,t (θ)) for i = 1, . . . , N and Σ t (θ) = E(ξ t ξ T t |F t−1 ) with ξ t = Y t − exp(ν t (θ)). Moreover,

are respectively (minus) the Hessian matrix and the information matrix.

. We drop the dependence on θ when a quantity is evaluated at θ 0 . Moreover, define λ max (X) the largest absolute eigenvalue of a symmetric matrix X,

Consider the set Ω d = {(2, 2, 2), (2, 2, 3), (2, 3, 3), (3, 3, 3)} and (j * , l * , k * ) = arg max j,l,k N −1 N i=1 ∂ 3 l i,t (θ)/∂θ j ∂θ l ∂θ k . To establish consistency and asymptotic normality of the QMLE, the following conditions and preliminary results are required.

B2 Let W be a sequence of matrices with nonstochastic entries indexed by N .

B2.1 Consider W as a transition probability matrix of a Markov chain, whose state space is defined as the set of all the nodes in the network (i.e., {1, . . . , N }). The Markov chain is assumed to be irreducible and aperiodic. Further, define π = (π 1 , . . . , π N ) T ∈ R N as the stationary distribution of the Markov chain, where π i ≥ 0, N i=1 π i = 1 and π = W T π. Furthermore, assume that

B2.2 Define W * = W + W T as a symmetric matrix. Assume λ max (W * ) = O(log N ) and

. Assume the following limits exist: d 1 = lim N →∞ N −1 tr (Λ),

Assumption B1 is a crucial assumption when processes with dependent errors are studied (see Doukhan (1994) ) as the α-mixing condition is a measure of asymptotic independence of the process.

Condition B1 implies that the errors ξ t = Y t − λ t are α-mixing, with the same sigma algebra F N t , and it is weaker than the IID assumption made on the errors in the previous literature (Zhu et al., 2017 (Zhu et al., , 2019 . In particular, the process Y t (or equivalently ξ t ) is an α-mixing array if, namely,

and it is clear that the dependence between two events A and B tends to vanish as they are separated in time, uniformly in N . Moreover, note that no rate of decay for the dependence measured by α(J) along time is specified, as a consequence, the α-mixing process can depend on several lags of its past before becoming "asymptotically" independent. This assumption holds true for the simple example of ξ t ∼ IID(0, Σ) where ξ t is constructed by the copula method proposed in this paper.

In this case, the noise is independent over time but it is non contemporaneous independent. Another example would be all the processes which satisfy

Assumption B2 requires some uniformity conditions on the network structure, and it is equivalent set of conditions as Zhu et al. (2017, C2) and Zhu et al. (2019, C2 .1-C2.2). Moreover, B2.2 requires that the network structure admits certain uniformity property (λ max (W * ) diverges slowly). Zhu et al. (2017, Supp. Mat., Sec. 7.1-7.3) found empirically that this is the case for several network models. In our case, regularity assumptions on the structure of dependence among the errors, when the network grows, are required by imposing that the diverging rate of λ max (Σ ξ ) will be slower than order O(N ), in B2.2, and its product with the squared sum of the stationary distribution of the chain, π, will tends to 0, in B2.1. We give an empirical verification of such conditions in Section S-3 of the SM. In the continuous-valued case introduced in Zhu et al. (2017) such assumptions are, in fact, not needed as the errors are IID with common variance σ 2 . Moreover, in this case, the absolute value is no more required because Σ ξ = E(ξ t ξ T t ) = σ 2 I N ; it can be understood since, in this case, D t = I N , so the second inequality in (A-5) is not needed any more; consequently, λ max (E(ξ t ξ T t )) = σ 2 , which reduces condition B2 to C2 in Zhu et al. (2017) , obtained as a special case.

The conditions outlined in B3 are law of large numbers-like assumptions, which are quite standard in the existing literature, since little is known about the behavior of the process as N → ∞.

These assumptions are required to guarantee that the Hessian matrix (12) converges to an existing limiting matrix (see (20) below). SM S-3 includes numerical study examples showing the validity of these limits. If OLS estimation with IID errors was performed, conditions B3 would correspond exactly to those in Zhu et al. (2017, C3) . In fact, recall that, in such framework, D t = I N in (14), so that N −1 tr (Λ) = 1, d 2 = κ 1 , d 3 = κ 2 and d 4 = κ 6 , where κ 1 , κ 2 , κ 6 are defined in C3 of Zhu et al. (2017) . A condition for the third derivative is assumed to guarantee valid large sample quasi-likelihood convergence.

Lemma 1. For the linear model (2), suppose β 1 + β 2 < 1 and B1-B3 hold. Consider S N T and H N T defined as in (11) and (12), respectively. Then, as {N,

Some preliminary lemmas employed to show the result are described in Appendix A.6. The proof of Lemma 1 follows in Appendix A.7.

Consider now the following conditions:

Assume that the following limits exist:

Condition B3 is simply an extension of assumption B3, required for the convergence of the conditional information matrix (13) to a valid limiting information matrix, see (22) below. More precisely, the reader can verify that B3 is just a special case of B3 , when Σ t = D t . The main reason that this assumption is introduced is that, for the QMLE, the conditional information matrix and the Hessian matrix are, in general, different. This does not occur in the case studied by Zhu et al. (2017) . Analogously to B3, when Y t is continuous-valued random vector, and we deal with IID errors ξ t , B3 reduces again to the conditions in Zhu et al. (2017, C3) .

Assumption B4 could be considered as a contemporaneous weak dependence condition. 

See Section S-4 of the SM, for an example where B4 is empirically verified. Define η ∈ R m , a non-null real-valued vector.

Lemma 2. For the linear model (2), suppose β 1 + β 2 < 1 and B1-B2, B3 -B4 hold. Consider S N T and B N T defined as in (11) and (13), respectively. Assume N −2 E(η T s N t ) 4 < ∞. Then, as

The condition N −2 E(η T s N t ) 4 < ∞ is not implied by B4. This condition is satisfied provided that (21) holds true for higher-order moments of the vectors {Y t }; See SM S-5 for a formal proof and more extensive discussion.

Theorem 3. Consider model (2). Let θ ∈ Θ ⊂ R m + . Suppose that Θ is compact and assume that the true value θ 0 belongs to the interior of Θ. Suppose that the conditions of Lemma 1-2 hold.

Then, there exists a fixed open neighborhood O(θ 0 ) = {θ : |θ − θ 0 | 2 < δ} of θ 0 such that with probability tending to 1 as {N, T N } → ∞, for the score function (11), the equation S N T N (θ) = 0 m has a unique solution, calledθ, which is consistent and asymptotically normal:

The extension of Theorem 3 to the general order linear PNAR(p) model is immediate, by using the well-know VAR(1) companion matrix; see (A-1). Assumptions B1-B2 and B4 remain substantially unaffected, using Debaly and Truquet (2021, Lemma 1.1). B3-B3 can be suitably rearranged similarly to Zhu et al. (2017, C4) and the result follows by Zhu et al. (2017, Supp. Mat., Sec. 4) . We omit the details.

Remark 4. A standard asymptotic inference result, with T → ∞, is obtained for the QMLEθ, where the "sandwich'" covariance is H −1 N B N H −1 N , by Theorem 3, as a special case, when N is fixed. This result requires only the stationarity conditions of Prop. 1, the compactness of the parameter space, and assuming that the true value of the parameters belongs to its interior. Such result is proved along the lines of Theorem 4.1 in Fokianos et al. (2020) . Similar comments apply also for the log-linear model below. The case, where T fixed and N diverging, cannot be studied in the framework we consider, since the convergence of the quantities involved in Lemmas 1-2 requires both indexes to diverge together. For details see also the related proofs in the Appendix A.7. This is empirically confirmed by some numerical bias found in the simulations of Sec. 4.1, when T is small compared to N .

Remark 5. It is worth pointing out that model (2) may be extended by including a feedback process such as

where J = α 1 W+α 2 I N and α 1 , α 2 ≥ 0 will be a network and autoregressive coefficients, respectively, for the past values of the conditional mean process. Such extension is suitable when the mean process λ t depends on the whole past history of the count process. When the network dimension is fixed, model (23) is just a special case of Fokianos et al. (2020, eq . 3), with a specific neighbor structure of the coefficients matrices therein. The stability conditions and asymptotic properties of the QMLE follow immediately. Note that (23) implies λ t = f (Y t−1 , Y t−2 , . . . , ), so all likelihood based quantities are evaluated recursively (for more, see Fokianos et al. (2020, eq . 12)), Therefore, when N → ∞, verification of Assumptions like B1-B4, which guarantee good large-sample properties of the corresponding estimators, is quite challenging problem because the dimension of the hidden process grows. See Appendix A.6 for comparison. Similarly, the log-linear model (6) can be extended by including the process ν t−1 in the right hand side but the same problem persists.

We now state the analogous result for the log-linear model (7) and the notation corresponds to

. Assumption B1 L is equal to B1 in the linear model. This holds also for B2 L , by considering

Assume the following limits exist:

B4 L There exists a non negative, non increasing sequence

The same discussion about the assumptions stated for the linear model applies in this case. In

, discovering again the conditions by Zhu et al. (2017) . Since the errors are independent, B4 L holds also in this case. Condition B4 L has been rescaled in terms of conditional covariances instead of correlations. This is simply due to the different form of the information matrix (19), which only includes the conditional covariance matrix Σ t . In contrast the linear model information matrix which corresponds to (15) is given

so that working with the correlations is more natural and convenient. A numerical analysis of assumptions B2 L -B3 L have been performed in SM S-3, and complement the results of the linear model. Recall that η ∈ R m , denotes a non-null real-valued vector.

Lemma 3. For the log-linear model (7), suppose |β 1 | + |β 2 | < 1 and B1 L -B4 L hold. Consider S N T and H N T defined as in (16) and (17)

where

Theorem 4. Consider model (7). Let θ ∈ Θ ⊂ R m . Suppose that Θ is compact and assume that the true value θ 0 belongs to the interior of Θ. Suppose that the conditions of Lemma 3 hold. Then, there exists a fixed open neighborhood O(θ 0 ) = {θ : |θ − θ 0 | 2 < δ} of θ 0 such that with probability tending to 1 as {N, T N } → ∞, for the score function (16), the equation S N T N (θ) = 0 m has a unique solution, calledθ, which is consistent and asymptotically normal:

The conclusion follows as for Theorem 3 above. An analogous result can be established for p > 1, since also log(1 N + Y t ) in (7) can be approximately rewritten as a VAR(1) model, similarly to (A-1).

In practical application one needs to specify a suitable estimator for the limiting covariance matrix of the quasi maximum likelihood estimators. To this aim, define the following matrix

The following results establish the inference for the limiting covariance matrix obtained by Theorems 3 and 4.

Theorem 5. Consider model (2) ( respectively, model (7)). Suppose the conditions of Theorem 3 (respectively, Theorem 4) hold true. Then, as {N,

These results allow to consistently estimate the covariance matrix of Theorems 3-4 using the

. We briefly discuss estimation of the copula parameter ρ. A thorough study of the problem requires separate treatment. Section S-8 of the SM contains results of a simulation study after employing a heuristic parametric bootstrap estimation algorithm. Such method potentially can be useful to select an adequate copula structure and estimate the associated copula parameter.

Finally, in Sec. S-6 of the SM, a novel regression estimator is proposed by considering a two-step Generalized Estimating Equations (GEE); the latter has been proved to be more efficient than the QMLE, based on numerical studies, especially when there exists considerable correlation among the counts.

We study the finite sample behavior of the QMLE for models (4) and (9). For this goal we run a simulation study with S = 1000 repetitions and different time series length and network dimension.

We consider the cases p = (1, 2). The adjacency matrix is generated by using one of the most popular network structure, the stochastic block model (SBM):

Example 1. (Stochastic Block Model). A block label (k = 1, . . . , K = 5) is assigned for each node with equal probability and K is the total number of blocks. Then, set P(a ij = 1) = N −0.3 if i and j belong to the same block, and P(a ij = 1) = N −1 otherwise. Practically, the model assumes that nodes within the same block are more likely to be connected with respect to nodes from different blocks.

For details on SBM see Wang and Wong (1987) , Nowicki and Snijders (2001) , and Zhao et al.

(2012), among others. The SBM model with K = 5 blocks is generated by using the igraph R package (Csardi and Nepusz, 2006) . The network density is set equal to 1%. We performed simulations with a network density equal to 0.3% and 0.5%, as well, but we obtained similar results, hence we do not report them here. The parameters are set to (β 0 , β 1 , β 2 ) T = (0.2, 0.3, 0.2) T . The observed time series are generated using the copula-based data generating process of Section 2.1.

The specified copula structure is Gaussian, C Ga R (. . . ), with correlation matrix R = (R ij ), where R ij = ρ |i−j| , the so called first-order autoregressive correlation matrix, henceforth AR-1. Then

. Tables 1 and Table 2 summarize the simulation results. For each simulated dataset, the QMLE estimation of unknown parameters has been computed by using the R package nloptr (Johnson) . It allows to run constrained optimization; for the linear model (4), for example, the quasi-log-likelihood (10) is maximized under the positive parameters constraint. Additional findings are given in Section S-7 of the SM-Tables S-1-S-4.

Then, the estimates for parameters and their standard errors (in brackets) are obtained by averaging out the results from all simulations; see the first tow rows of Tables 1-2. The third row below each coefficient shows the percentage frequency of t-tests rejecting H 0 : β = 0 at nominal level 5% and it is calculated over the S simulations. We also report the percentage of cases where various information criteria select the correct generating model. In this study, we employ the Akaike (AIC), the Bayesian (BIC) and the Quasi (QIC) information criteria. The latter is a special case of the AIC which takes into account that estimation is done by quasi-likelihood methods. See Pan (2001) for more details.

We observe that the estimates are close to the real values and the standard errors are small for all the cases considered. When there is a strong correlation between count variables Y i,t -see Table 1 -and T is small when compared to the network size N , then the estimates of the network effectβ 1 have slight bias. The same conclusion is drawn from Table S-1. Instead, when both T and N are reasonably large (or at least T is large), then the approximation to the true values of the parameters is adequate. This fact confirms the related asymptotic results obtained in Section 3 by requiring N → ∞ and T N → ∞. Standard errors reduce as T increases. Regarding estimators of the log-linear model (see Table 2 and S-3), we obtain similar results.

The t-tests and percentage of right selections due to various information criteria provide empirical confirmation for the model selection procedure. Based on these results, the BIC provides the best selection procedure for the case of the linear model; its success selection rate is about 99%; this is so because it tends to select models with fewer parameters. For the opposite reason, the AIC is not performing as well as BIC but still selects the right model around 92% of time.

The QIC provides a good balance between the other two information criteria; its value is around 95%. Moreover, it has the advantage to be more robust, especially when employed to misspecified models. This fact is further confirmed by the results concerning the log-linear model, even though the rate of right selections for the QIC does not exceed 88%. To validate these results, we consider the case where all series are independent (Gaussian copula with ρ = 0). Then QMLE provides satisfactory results if N is large enough, even if T is small (see Table S -2, S-4). Moreover, the slight bias reported, for some coefficients, when ρ > 0, is not observed in this case. Intuitively, the reason lies on the complexity of the network relations, which does not grow with N , since variables concerning different nodes are independent. Furthermore, the QMLE for this case coincides to the true likelihood function. From the QQ-plot shown in Figure S -13-S-14 we can conclude that, with N and T large enough, the asserted asymptotic normality is quite adequate. A more extensive discussion and further simulation results can be found in Sec. S-7 of the SM. (2), for various values of N and T . Network generated by Ex. 1. Data are generated by using the Gaussian AR-1 copula, with ρ = 0.5 and p = 1. Model (4) is also fitted using p = 2 to check the performance of various information criteria. 

The application on real data concerns the monthly number of burglaries on the south side of Chicago from 2010-2015 (T = 72). The counts are registered for the N = 552 census block groups.

The data are taken by Clark and Dixon (2021) , https://github.com/nick3703/Chicago-Data.

The undirected network structure raises naturally, as an edge between block i and j is set if the locations share a border. The density of the network is 1.74%. The maximum number of burglaries in a month in a census block is 17. The variance to mean ratio in the data is 1.82, suggesting there is some overdispersion in the data. The median of degrees is 5. On this dataset we fit the linear and log-linear PNAR(1) and PNAR(2) model. The results are summarized in Table 3 -4. All the models have significant parameters. The magnitude of the network effects β 11 and β 12 seems reasonable, as an increasing number of burglaries in a block can lead to a growth in the same type of crime committed in a close area. Also, the lagged effects have a positive impact on the counts.

Interestingly, the log-linear model is able to account for the general downward trend registered from 2010 to 2015 for this type of crime in the area analysed. All the information criteria select the PNAR(2) models, in accordance with the significance of the estimates. Estimation of the copula is advanced according to the algorithm of Sec. S-8 of the SM. As a preliminary step, we order the observations Y i,t for i = 1, . . . , N with respect to their sample vari-ance, in decreasing order. The Gaussian AR-1 copula, described in Sec. 4.1, is compared versus the Clayton copula, over a grid of values for the associated copula parameter, with 100 bootstrap simulations. The former is selected 100% and 97% of the times, for the linear and the log-linear PNAR (1) model, respectively. The estimated copula parameter isρ = 0.656 andρ = 0.546, for the linear and log-linear model, respectively, with small standard errors 0.046 and 0.058, correspondingly.

A further estimation step for the PNAR(1) models is performed by applying the two-step GEE estimation method introduced in Sec. S-6. The previous QMLE estimates are used as starting values of the two-step procedure. An AR-1 working correlation matrix P(τ ) is selected, withτ 1 as the estimator of the correlation parameter. To compare the relative efficiency of the GEE (θ)

versus QMLE (θ), the bootstrap standard errors have been performed, with 100 simulations, for both estimation methods, by using the estimated copula to generate bootstrap replicates; then we compute the ratio of the standard errors obtained, q(θ,θ) = m h=1 SE(β h )/ m h=1 SE(β h ). The results are q(θ,θ) = 1.022 and q(θ,θ) = 1.003, for the linear and log-liner model, respectively. We note a marginal gain in efficiency from the GEE estimation; this is probably due the a small value of the estimated correlation parameter τ , which is found to be around 0.005 and 0.003, on average, for linear and log-linear model, respectively. Using different kind of estimator for the correlation parameter might yield significant efficiency improvement but a further study in this direction is needed.

Recall that C is a generic constant and C r is a constant depending on r ∈ N. See also the notation paragraph in the introductory Section 1.

It is easy to derive some elementary properties of the linear PNAR(p) model.

where ξ t is a martingale difference sequence, and rearrange it in a N p-dimensional VAR(1) form by

For model (A-1) we can find the unconditional mean E(

where vec(·) denotes the vec operator. For details about the VAR(1) representation of a VAR(p) model and its moments, see Lütkepohl (2005) . Define the selection matrix J = (I N : 0 N,N : · · · : 0 N,N ) with dimension N × N p. Moreover,

where the first equality follows by ρ( p h=1 G h ) < 1 and the second one is true since W1 N = 1 N , by construction, and so E1 N = p h=1 (β 1h + β 2h )1 N ; the iteration of this argument j times gives the result. Then, the following proposition holds.

Proposition A.1. Assume p h=1 (β 1h + β 2h ) < 1 in model (3). Then, the PNAR(p) model has the following unconditional moments:

Applying these results to model (1) (equivalently (2)), we obtain

The mean of Y t depends on the network effect β 1 and the momentum effect β 2 but not on the structure of the network; this is true in the case the covariates are not present (Zhu et al., 2017, Case 1) . By contrast, the network structure always has an impact (via W) on the second moments; in addition, the conditional covariance Σ t shows that it depends on the copula correlation. Equations (A-2) are analogous to equations (2.4) and (2.5) of Zhu et al. (2017, Prop. 1) , who studied the continuous variable case. Then, the interpretations (Case 1 and 2 pp. 1099-1100) and the potential applications (Section 3, p. 1105) apply also here for integer-valued case.

For clarity in the notation, we present the result for the PNAR(1) model, but it can be easily extended to hold true for p > 1. By β 1 +β 2 < 1 and (A-2), we have that E (Y i,t 

where the last inequality follows for the stationarity of the process {Y t , t ∈ Z} and the finiteness of its moments, with fixed N . As max 1≤i≤N E |Y i,t | 2 is bounded by C 2 , for the same reason above

where the second inequality holds because of the conditional Jensen's inequality, and so on, for r > 3, the proof works analogously by induction and therefore is omitted.

Recall from Zhu et al. (2017, Def. 1 

For each ω ∈ W, let ω N = (ω 1 , . . . , ω N ) T ∈ R N be the its truncated N -dimensional version. By considering the VAR(1) representation for the PNAR(1) model (2), defined in Appendix. A.1, the process can be rewritten by backward substitution, Y t = (I N −

For sake of clarity we show the result for the PNAR(1) model. However, the general p-lags parallel result extends straightforwardly, by considering the companion VAR (1) representation form (A-1) of the linear PNAR(p) model. By Proposition 2, it holds that E(Y i,t ) ≤

Similar uniform bounds are obtained for moments of order r > 1. For any ω ∈ W,

Then, by Monotone Convergence Theorem (MCT), lim N →∞ ω T N Y t exists and is finite with probability 1, moreover Y ω t = lim N →∞ ω T N Y t is strictly stationary and therefore {Y t } is strictly stationary, following Zhu et al. (2017, Def. 1) . To verify the uniqueness of the solution, take another stationary solutionỸ t to the PNAR model with finite moments of any order. Then,

, for any N and weight ω. Since m is arbitrary, Y ω t =Ỹ ω t with probability one.

For simplicity set p = 1. Since ψ t is approximately MDS and |β 1 + β 2 | < 1, an approximated version of Prop. A.1 holds for Z t = log(1 N + Y t ) in (9), with suitable adjustments. Then, E(Z i,t ) ≈ µ, and a first order Taylor approximation provides E(

for all r ≥ 1. By assuming the existence of moments of order k ∈ N, i.e. max i≥1 E(Z i,t ) k < ∞, a we can obtain a more accurate approximation of polynomial order k. From the approximation above, E(Z i,t ) r ≤ C r and, then, E |ν i,t | r ≤ (|β 0 | + (|β 1 | + |β 2 |)C 1/r r ) r := C ν r , for all r ∈ N. The existence of the latter moments allows to perform a Taylor approximation for the function exp(r |ν i,t |) of any arbitrary order on |ν i,t |, around its mean, leading to the conclusion E(exp(r |ν i,t |)) ≤ D r , ∀r ≥ 1.

By Prop. 3, ω T N Y t is strictly stationary, for any N and all the moments of Y t exist by Prop. 4. Then, E ω T N Y t < ∞ and Y ω t = lim N →∞ ω T N Y t exists with probability one and is stationary. Hence, {Y t } is strictly stationary, following Zhu et al. (2017, Def. 1) . To prove the uniqueness of the solution, note that the proof of Theorem 1 applies to Z t = log(1 N + Y t ) = β 0 + GZ t−1 + ψ t , by suitable adjustments, such as, |G| j v 1 N = (|β 1 | W + |β 2 |) j 1 N . Therefore, {Z t } is strictly stationary, in the sense of Zhu et al. (2017, Def. 1) , and unique stationary solution to the log-linear PNAR model.

The same holds for the process {Y t = exp(Z t ) − 1 N } since it is a one-to-one deterministic function of the unique solution.

Before proving Lemmas 1-2 we introduce the following preliminary results.

Lemma A.1. For model (2) assume β 1 +β 2 < 1 and that the conditions B2 hold. Then, there exists K > 0 such that for any integer n > 0, |G n | v n K (β 1 + β 2 ) n M where M = C1π T + K j=0 W j , C > 1 is a constant and π is defined in B2.1. Moreover, for integers 0 ≤ k 1 ≤ 1 and j > 0, 

, then we need to prove the limit

using Cauchy-Schwartz inequality. This means one just needs to prove

For the first term of (A-3), by applying the spectral decomposition on Σ ξ we have π T Σ ξ π =

is an orthogonal matrix whose columns are orthonormal eigenvectors of Σ ξ and Λ ξ is a diagonal matrix with its eigenvalues. Then, for Cauchy inequality, it holds that q(z 

where the second inequality is due to Zhu et al. (2017, Supp. Mat., p. 7), with W * = W + W T and the convergence follows by B2.2.

and f (Y t−1 , θ) = λ t = β 0 + GY t−1 . Define the following predictors, for J > 0:

The first inequality holds for an application of the multivariate mean value theorem. Moreover,

and the last inequality comes from the previous recursion. It is immediate to see that, for t − J < 0,

We split the proof accordingly to each single result given in Lemma 1.

A.7.1 Proof of (1) t ) 1≤r,l≤m . We take the most complicated element, h 22,t , the result is analogously proven for the other elements. Define l 1,

Set 1/a + 1/b = 1/2 and 1/q + 1/p + 1/n = 1/a. By Cauchy-Schwartz inequality, as w ij > 0 for j = 1, . . . , N and N j=1 w ij = 1 we have that

By an analogous recursion argument, it holds that max i≥1 Ŷ 2

In the same way we can conclude that max i≥1 l 2,i,t q < l 2 < ∞ and max i≥1 l 3,i,t q < l 3 < ∞. Then, by Minkowski inequality

where c N t = m r=1 m l=1 η r η l c rl and ν J = d J−1 → 0 as J → ∞, establishing L p -near epoch dependence (L p -NED), with p ∈ [1, 2], for the triangular array Andrews (1988) . Moreover, by a similar argument above, it is easy to see that E X N t 2 < ∞, by the finiteness of all the moments of the process Y t . Then, using B1 and the argument in Andrews 

Note that the linear model (2) can be rewritten has

by assumption B3. The second term 

For all non-null η ∈ R m , the triangular array η T s N t /N : 1 ≤ t ≤ T N , N ≥ 1 is a martingale difference array. Moreover, E(η T s N t /N ) 2 < ∞, by Cauchy inequality and the boundedness of all the moments of {Y t }. Then, η T s N t /N is trivially a uniformly integrable L 1 -mixingale. An application of Andrews (1988, Thm. 2) provides the result.

From θ ∈ O(θ 0 ), we have β 0, * ≤ β 0 ≤ β * 0 , where β 0, * , β * 0 are suitable positive constants. Consider the third derivative

Take the maximum of the third derivatives among {i, l, k} to be, for example, at θ j * = θ l * = θ k * = β 1 , the proof is analogous for the other derivatives, 

. Then point (3) of Lemma 1 follows by the last limit of B3. We omit the details.

Analogously to A.7, we address separately each point of Lemma 2.

A.8.1 Proof of (1)

, which are the elementwise conditional covariances and correlations, respectively. Then

The second inequality is obtained employing the arguments used for the element h 22,t of the Hessian as in A.7. Moreover, r 1,i,j,t = (w T i Y t−1 )(w T j Y t−1 )(λ j,t +λ j,t ) and r 2,i,j,t = λ i,t λ j,t (w T i Y t−1 +w T i Y t−1 t−J ). Set 1/q + 1/h = 1/b. Note that max i,j≥1 r 1,i,j,t q < r 1 < ∞, max i,j≥1 r 2,i,j,t q < r 2 < ∞ by the same argument of max i≥1 l 1,i,t a < l 1 above. When i = j, σ iit = λ i,t , consequently,

which is bounded by 2Φ and the first inequality is a consequence of B4. Then, ∀i, j = 1, . . . , N , we

Here again ν J = d J−1 . Then, the triangular array

with EX 2 N t < ∞, and Theorem 2 in Andrews (1988) holds for it. This result and B1 yield to the convergence

as {N, T N } → ∞, for any non-null η ∈ R m . The existence of the limiting information matrix (22) follows the same methodology used in A.7.1 for the existence of (20), by considering B3

instead of B3. The same notation B N = (B k,l ) k,l=1,...,m and the same splits for each elements of the information matrix are adopted. So we highlight only the element which is different, i.e.

which converges to 0, as N → ∞, following (A-5).

Now we show asymptotic normality. Define ε N t = η T ∂λt ∂θ T D −1 t ξ t , and recall the σ-field

for any δ > 0, as N → ∞. By the result in equation (A-6)

for any δ > 0, as N → ∞. Then, the central limit theorem for martingale array in Hall and Heyde (1980, Cor. 3 

, and an application of the Cramér-Wold theorem leads to the desired result.

Recall the quasi log-likelihood (10) and set K N (δ) = θ : |θ − θ 0 | 2 ≤ δ/ √ N T N a compact neighborhood of θ 0 , for any δ > 0. A Taylor expansion gives 

2 ≤ C < ∞ by Lemma 2. By combining all the above and following Fokianos and Tjøstheim (2012, Proof of Thm. 1, p. 1224) we obtain that there exists, with probability tending to one, a solution to the system S N T N (θ) = 0 m , denoted byθ, in the interior of K N (δ). Since all elements of (12) are positive, we have ν T H N T N (θ 0 )ν > 0, for any non null ν ∈ R m . Then,θ is unique solution in the interior of K N (δ). The same argument applies for any 0 < δ 1 < δ, i.e. there exists with probability tending to one a solution to the score equations in K N (δ 1 ). Butθ is the unique solution to the score equations in K N (δ) and therefore lies in K N (δ 1 ) with probability tending to one. Thenθ is consistent. The asymptotic normality follows by a Taylor expansion of the score and Lemmas 1-2.

We omit the details.

The proof is analogous to that of Lemmas 1-2. We will point out only the parts which differ substantially.

Lemma A.3. Define Z t = log(1 + Y t ) and set d = |β 1 | + |β 2 |. Rewrite the linear model (7) 

Proof. The proof is analogous to Lemma A.2 and therefore is omitted.

A.10.1 Proof of (1)

. The second inequality follows by |exp(x) − exp(y)| = |exp(y)(exp(x − y) − 1)| and |(exp(x − y) − 1)| ≤ exp(|x − y|) |x − y| ≤ exp(|x| + |y|) |x − y|, for x, y ∈ R. Set 1/a + 1/b = 1/2 and 1/q + 1/p + 1/n = 1/a. It is

All these quantities are bounded by Prop. 4. Similarly to the linear model, Lemma A.3 entails

Similarly, max i≥1 ν i,t q are bounded. Similarly to the Proof of Prop. 4, the existence of the latter moments allows to perform a Taylor approximation for the function exp(q |ν i,t |) of any arbitrary order on |ν i,t |, around its mean, leading to exp(|ν i,t |) q < ∞, ∀q ≥ 1, max i≥1 c 1,i,t q < c 1 < ∞ and max i≥1 c 2,i,t q < c 2 < ∞.

Then, h 22,t − h t 22,t−J 2 ≤ c 22 ν J is L p -NED and, by Assumption B1 L , the conclusion follows as for the linear model. The proof of existence of the matrix H in (25), follows by B2 L and is a special case of the existence of the matrix B, showed in the next point.

A.10.2 Proof of (2)

. Working analogously as before

, for B4 L , similarly to the linear model, proving L p -NED.

We prove the existence of the matrix B as in (25), using the properties of the network (B2 L ).

12b /N , where B 12a = µB 11 → µµ * y and B * 12b /N = H 12b /N + B 12b /N , where H 12b /N → l 1 , by B3 L , and B 12b /N → 0, by B2 L , similarly to the linear model, as N → ∞. B 22 /N = B 22a /N +B 22b /N +B 22c /N +B 22d /N , where B 22a /N = µ 2 B 11 → µ 2 µ * y , B 22b /N = B 22c /N = µB * 12b /N → µl 1 , and finally B 22d = N −1 tr[∆ L (0)] → g 5 , by, B3 L , as N → ∞. The other elements follow similarly. The proof of asymptotic normality is established in the same fashion of the linear model and therefore is omitted.

Consider the third derivative

Take, for example, the case θ * j = θ * l = θ * k = β 1 ,

The rest of the proof can be derived analogously to the proof of Lemma 1. We omit the details.

For any deterministic non-null vector η ∈ R m , ( 

where B N t andB N t are the single summands of (13) and (26), respectively. Furthermore, set

is a martingale difference array and, since E(J N t (θ 0 )) 2 ≤ 4E(η T s N t ) 4 < ∞, by the results of Theorem 3, the sequence J N t (θ 0 ) is a uniformly integrable L 1 -mixingale. By Andrews (1988, Thm. 2), The proof is analogous for the log-linear model (7), therefore is omitted.

The supplementary material contains further details on properties of linear and log-linear PNAR models. A more extended discussion about conditions of Lemma 2 and an empirical exploration of assumptions B2-B4 are also provided. Additional simulation results are presented. A numerical study concerning a more efficient GEE estimator is included. Finally, guidelines for the estimation of the copula and its parameter are discussed.

To gain intuition for model (1), we simulate a network from the stochastic block model (Wang and Wong, 1987) ; see Figure 

We give here some insight on the structure of the model (6) above for the linear model. Here an explicit formulation of the unconditional moments is not possible for the count process {Y t }. We report the sample statistics to estimate the unknown quantities and replicate the same baseline characteristics and the same scenarios of the linear case. In Figure S-4 we can see that, analogously to the linear case, the correlations among counts grow when more activity in the network is showed.

However, here a more sparse matrix seems to slightly affect correlations. The general levels of correlations are higher than the linear case in Figure impact of negative coefficients on correlations. However, the sample means and variances decrease when compared to the corresponding plots produced using β 1 , β 2 > 0.

This section illustrates some numerical evidence verifying the network condition B2 and the convergence of the limits assumed in B3-B3 . Starting from B2.1, a sufficient condition for both irreducibility and aperiodicity of the Markov chain, whose states are the nodes of the network {1, . . . , N }, is that the network is always fully connected after a finite number of steps. Moreover, in B2.2, a sufficiently slow diverging rate of λ max (W * ) is required. These results have already been verified empirically in Zhu et al. (2017, Supp. Mat., Sec. 7) , therefore they are omitted here. The remaining conditions to be verified for B2 are i) λ max (Σ ξ ) = O((log N ) δ ), for some δ ≥ 1, and ii)

where π is the stationary distribution of the Markov chain defined in B2.1, π = lim k→∞ W k . To this aim, we consider the same simulation setting of Sec. 4.1, with the copula parameter ρ = 0.5, k = 1000, T = 200 and N = (200, 250, 300, 350, 400) . The number of simulations is S = 500. Condition i) requires that λ max (Σ ξ ) < C(log N ) δ , where recall that C is a generic constant. We approximate Σ ξ by using the sample counterpart T −1 T t=1 ξ t ξ T t v and compute the maximum eigenvalue, to obtain µ ξ (N, δ) = λ max (Σ ξ )/(log N ) δ , for various values of δ. The experiment is replicated S times for µ ξ (N, δ) , whose values are box-plotted in the left plot of Figure S -8, with δ = 8; we observe that its values are well bounded by a positive constant, when N grows. We obtain similar results for δ = (6, 7). When δ > 8, the approximation im-

The second boxplot in Figure S -8, shows that this is indeed the case, when N is increasing, for example, with γ = 1/2.

Regarding the limit convergence assumptions specified by B3-B3 , a similar numerical verification as above can be studied. We consider the most complex element N −1 tr[∆(0)], where tr[∆(0)] is substituted by the empirical counterpart, tr[∆ (0) 

The condition (21) in assumption B4 depends on the copula construction C(. . . , ρ) specified on the exponential waiting times X i,l of the data generating process (DGP) discussed in Section 2.1.

This structure of dependence, in turn, is transferred to the generated Poisson random variables Y i,t , through a non-deterministic transformation. While the copula is invariant to one-to-one deterministic transformations, we do not have such a transformation in this case. Then, it is not clear how to establish the form of the conditional covariance (and correlation) of the counts, and so assumption B4 cannot be addressed theoretically. However, this problem appears in other settings. Indeed, similar difficulties arise even in alternative frameworks, like imposing the copula construction directly on Poisson marginals, see the discussion by Inouye et al. (2017) , among others. Furthermore, direct joint multivariate count distributions implicitly introduce strong constraints to the correlation structure of the counts. For instance the multivariate Poisson distribution described in Fokianos (2022, Sec. 2.1), has the property Cov(Y i,t , Y j,t | F t−1 ) = λ 0 > 0, ∀i = j, with λ 0 being a constant parameter.

It is still possible to give an empirical evidence where B4 is satisfied. For example, suppose the selected copula is Gaussian, with AR-1 correlation matrix, as specified in Sec. 4.1. Recall that the correlation matrix of such copula has single element R ij = ρ |i−j| . Clearly, by fixing i = 1, for example, and j = 1, . . . , N , we obtain a geometrically decaying pattern of correlations. However, R ij do not correspond to correlations of the observed count random variables, nor represent the correlations of the exponential inter-arrival times, in general. Then, we want to explore the impact of the copula correlation coefficients on the correlations of the exponential random variables first, and, in turn, on the Poisson ones. In this empirical study the network structure is given by the Erdős-Rényi Model (ER), but analogous results were obtained for the SBM of Example 1, with K = {2, 5}, and therefore are omitted.

Example S1. (Erdős-Rényi Model). Introduced by Erdös and Rényi (1959) and Gilbert (1959) , the network is constructed by connecting N nodes randomly. Each edge is included in the graph with probability p, independently from every other edge. In this example we set p = P(a ij = 1) = N −0.3 . Figure S-10 shows the results of a simulation study on the theoretical copula correlations specified versus the mean of empirical pairwise correlations, for the exponential random variables X i,t,l , with l = 1, . . . , 50, obtained by averaging out the correlations matrices along the simulations. The correlation structure of the copula appears to be essentially transferred to the correlations of the exponential waiting times. Figure S-11 depicts, instead, the resulting mean empirical pairwise correlations of the Poisson random variables Y i,t , obtained from S = 100 simulations. As expected, the correlation structure of the exponentials random variables does not correspond to the correlation structure of the Poisson ones. However, when magnitudes of the network (β 1 ) and autoregressive (β 2 ) effects are not too large, a detected decaying pattern towards zero of the empirical correlations is still inherited by the counts. This means that a non increasing sequence {ϕ h } h=1,...,∞ , such that ∞ h=1 ϕ h = Φ < ∞, satisfying (21), can always be found for the correlations of the Poisson variables. For example, the orange line in Table S -11, shows the graph of ϕ h = 6/h 1.001 , which clearly satisfies B4. Similar outcomes, not presented here, have been found for the Student's t copula, with 2 and 10 degrees of freedom. 

Define the normalized random process X t = D −1/2 t ξ t , so X i,t = (Y i,t −λ i,t )/ λ i,t , such that E(X i,t ) = 0 and E(X 2 i,t ) = 1. Recall (21), for i < j, which can be rewritten as |Cov(X i,t , X j,t | F t−1 )| ≤ ϕ j−i . Consider the following assumption.

B5 The sequence {ϕ h } h=1,...,∞ defined in B4 is such that ∞ h=1 hϕ h = Φ 1 < ∞ and, for i < j < k < l, almost surely

Proposition S1. Consider model (4) and the score (11). If conditions B4-B5 hold, then, for all 

where C ≥ 0 is a universal constant and the inequality follows by an application of Yaskov (2015, Thm. 2.1) with conditional expectation.

which is the Hessian matrix squared (up to a constant), and it can be shown to be uniformly bounded, by standard Cauchy inequalities and the boundedness of all the moments of Y t .

A similar result is established for the log-linear model (Lemma 3), by adding the extra assumption exp(ν i,t ) ≥ c 0 , for i = 1, . . . , N , where c 0 > 0 is a constant. Indeed, by (24), for i < j, |Cov(X i,t , X j,t | F t−1 )| = |Cov(Y i,t , Y j,t | F t−1 )| /( exp(ν i,t ) exp(ν j,t )) ≤ φ j−i /c 0 := ϕ j−i .

We have developed a complete theoretical framework for QMLE inference. However, in general, the QMLE is computed under the independence assumption and therefore it will suffer efficiency loss. The lack of efficiency might become large especially when large correlations among the count processes are present. Some empirical evidence regarding this problem has been found in numerical studies of Sec. 4.1 and S-7. Since, in practice, the dependence structure among the integer-valued random variables is modelled through a copula construction, with a parameter ρ, the problem can be intuitively viewed on examining the performance of the QMLE when the copula parameter ρ takes large values. We propose here a two-step Generalized Estimating Equation (GEE) procedure, originally introduced for longitudinal data analysis (Zeger and Liang, 1986) , which adds a further estimation step to the previous QMLE methodology. Consider model (4), for example. The proposed QMLE maximizing the quasi log-likelihood (10) is the m-dimensional vector of unknown parameterθ which solves the system of equation S N T (θ) = 0 m , where S N T is defined in (11). The GEE estimatorθ will be then the vector of roots of the following system.

where V t (θ, r) −1 = D t (θ) −1/2 P(τ ) −1 D t (θ) −1/2 and P(τ ) is an N × N matrix, the so called working correlation matrix, depending on a correlation parameter τ . The matrix P(·) is specified by the researcher and it will not necessarily reflect the true contemporaneous correlation structure of the multivariate process Y t , which in fact depends on the copula and whose analytical form is not available. Moreover, the working correlation matrix should not be confused with the copula correlation matrix R which is specified on elliptical copulas, see for example Sec. 4.1, as the two matrices can be, in general, different.

Note that the QMLE can be obtained as a particular case of (S-1), when P(τ ) = I N , recovering the score function (11). Intuitively, the estimator obtained by (S-1), which incorporates a correlation structure among multivariate process Y t , by introducing the matrix P(τ ), is expected to explain a larger part of the variability connected to the process of study. Therefore, (S-1) should lead to improved efficiency results, compared to the QMLE with independence likelihood (10). In summary, we propose the following two-step GEE estimator.

1. Perform the QMLEθ of model (4). Select a structure for P(τ ) and computeτ = τ (θ).

2. Estimateθ from (S-1), with working correlation P(τ ).

There are two problems related to this methodology: i) choosing a suitable form for the working correlation matrix, among several available alternatives; see for example Pan (2001, p. 121) , ii) find a suitable estimator for the parameter τ . For problem i), we note the following. Since the network dimension N may be large, the inversion of the N × N matrix P(·) may be extremely expensive and, ultimately, computationally unfeasible. Furthermore, since only the inverse of such matrix would be required by the proposed estimating equations, we suggest to find a correlation structure where an analytical form for the inverse is currently known. For example, in this work we decide to impose the AR-1 correlation structure P(τ ) = (P ij ) where P ij = τ |i−j| , for i, j = 1, . . . , N and i = j. Such working correlation matrix is appealing since it has an analytical form of the inverse, i.e. P −1 (τ ) = 1/(1 − τ 2 )T(τ ), where T(τ ) is a tridiagonal matrix, with the main diagonal consisting of the elements of the N × 1 vector (1, 1 + τ 2 . . . , 1 + τ 2 , 1), and the remaining two diagonals are the elements of the (N − 1) × 1 vector of (−τ, . . . , −τ ); see Sutradhar and Kumar (2003, eq. 1.1) . The estimation of the correlation parameter is usually performed using moment estimators (Zeger and Liang, 1986, Sec. 4) . We select two estimators,τ 1 = 2 N i=1 j>iτ ij (θ) and τ 2 = max i,j=1,...,N, j>iτij (θ), wherê

are the empirical Pearson correlations from the quasi-likelihood estimation.

To check the performance of the two-step GEE estimator we run a simulation study, by generating data from model (2) using the same setting of the simulation study outlined in Sec When the data are generated with a moderate or strong correlation (copula parameter ρ > 0.5) the GEE is relatively more efficient than the QMLE. The improvement in efficiency is larger as the correlation becomes stronger and tends to grow even by increasing network and time size (N, T ).

This would be expected since the QMLE becomes a poor approximation of the true likelihood as the dependence structure of the multivariate count process Y t becomes stronger, whereas the specified GEE methodology appears to be able to account for a significant part of the correlations among the counts, even though the working correlation does not reflect the true correlation structure of the data. Similar results are obtained by comparing the marginal efficiencies e(β h ,β h ), with h = 0, 1, 2, therefore are omitted. The employment of the moment estimatorτ 1 gave analogous results but with gain in efficiency less than the gain obtained by usedτ 2 . The results of the current section can be established similarly for the log-linear version of the model.

Although the result of the present simulation study are encouraging, the addressed problem is only at the initial stage and further studies are required. For example, alternative structure for the working correlation structure may be used, like the equicorrelation structure (EQC), P(τ ) =

(1 − τ )I N + τ J N , where J N is a N × N matrix of ones and an equal pairwise correlation τ is assigned to all the possible couples of the multivariate time series. This matrix also have analytical

For a proof see Rao (2002, p. 67 ); moreover, more complex correlation structures could be considered. In addition, further estimators for the correlation parameter τ may be available. Finally, the development of an asymptotic theory for the proposed GEE estimator would be of interest. We defer these extensions to a future contribution.

We present here further comments and findings from the simulation study reported in Sec. 4.1. In the situation of independence (ρ = 0) the QMLE reduces to the standard MLE. When N is big and T is small we see that QMLE provides satisfactory results (Table S -2, S-4). However, this is not always the case. Following the results of Sec. 4.1 and when ρ 0, a more complex structure of dependence among variables will be observed. Therefore the quasi likelihood (10) can be thought as a crude approximation to the true likelihood. In particular, when N → ∞ and T is small, care must be taken in the interpretation of obtained estimates. This fact is also confirmed by the Tables S-1 and S-3 who illustrate slightly worse results when there exists a stronger dependence among the count variables. Finally, if both the temporal size T and the network size N are reasonably large, then inferential results in Section 3 are confirmed.

Figure S-14 shows a QQ-plot of the standardized estimators for the log-linear model of order 1, with Gaussian copula (ρ = 0.5) and N = 100. When both dimensions are large, then the approximation is satisfactory. Clearly, by reducing dependence among count variables, we can obtain better large-sample approximations but these results are not plotted due to space constraints. (7), for various values of N and T . Network generated by Ex. 1. Data are generated by using the Gaussian AR-1 copula, with ρ = 0.9 and p = 1. Model (9) is also fitted using p = 2 to check the performance of various information criteria. 

Dependence relations among the observed count time series is equivalent to the problem of obtaining good estimates for the copula. We propose an heuristic parametric bootstrap algorithm for both identification of the copula structure C(. . . , ρ) and estimation of the unknown copula parameter ρ. We consider the case of one-parameter copula specifications. A thorough study of the problem will be discussed elsewhere. The methodology is based on parametric bootstrap and it is outlined below:

1. Given the observations, Y i,t , i = 1, . . . , N , estimateθ.

2. For a given a copula structure and for a given value of the copula parameter, generate a sample of marginal Poisson counts, Y b i,t , i = 1, . . . , N , with the data generating process introduced in Sec. 2.1, using the estimates from step 1.

4. Repeat step 2-3 for different copula structure and over a grid of values for the copula parameter. The obtained estimate for the copula structure and the associated copula parameter,ρ b , is the one which minimize the WMAE.

5. Repeat steps 2-4 for B times, where b = 1, . . . , B, to select the copula structure and parameter value minimizing the WMAE, giving the realizationsρ 1 , . . . ,ρ B . The chosen copula structure is the one that is selected most of the times. The final estimate of the copula parameter ρ is the average of copula parameters for such realizations (computed by only considering the realizations of the copula structure that is selected most of the times). Similarly, the associated standard error are computed only from these realizations.

Instead of using l mean square error (MSE), we preferred the mean absolute error (MAE), normalized by true observation, in symbols,

this quantity is less sensible to extreme values. Finally, to avoid the so called one divided by zero problem, which occur when zero or small denominators are encountered, the WMAE has been chosen as the minimizing criterion.

We run a small simulation study to show the effectiveness of the proposed estimation algorithm.

The network is obtained from the SBM model presented in Ex. 1, with K = 2 blocks. We set N = 100, T = 1000 and (β 0 , β 1 , β 2 ) = (1, 0.2, 0.1). Data are generated by linear and log-linear PNAR(1) models as in (2),(7), respectively, by using the DGP of Section 2.1, with a Gaussian AR-1 copula structure, as described in Sec. 4.1, where the true value of the copula parameter is ρ = 0.5. We run the bootstrap algorithm, with B = 500, comparing the Gaussian AR-1 copula and the Clayton copula. The parameters are picked on a grid of 17 equidistant values in the interval (0.1,0.9) for the Gaussian copula parameter, and (0.5,8) for the Clayton one. The results are summarized in Table S -5. The third row indicates the percentage selection of the right copula structure (Gaussian AR-1); we see that the right copula structure is selected the vast majority of the times, for both models. Within these realizations the obtained estimate for the copula parameter, displayed in the first row, is quite accurate (for both models). Finally the second row shows that the associated standard errors are quite small and confirm the significance of the estimates. Table S -5: Copula estimation for Gaussian AR-1 versus Clayton copula. Network generated as in Ex. 1, with K = 2. Data generated by models (2),(7) and Gaussian AR-1 copula, with ρ = 0.5.

Results based on B = 500 bootstrap replications. 

Contributionsà l'éconemétrie des séries temporellesà valeurs entières

Poisson QMLE of count time series models

First-order integer-valued autoregressive (INAR (1)) process

An integer-valued pth-order autoregressive structure (INAR (p)) process

Models and inference for correlated count data

Laws of large numbers for dependent non-identically distributed random variables

Observation-driven models for discrete-valued time series

Statistical Inference for Stochastic Processes

Endemic-epidemic models with discrete-time serial interval distributions for infectious disease prediction

The impact of sampling and network topology on the estimation of social intercorrelations

Quasi-likelihood inference for negative binomial time series models

A class of spatially correlated self-exciting statistical models

Space-time modelling with an application to regional forecasting. Transactions of the Institute of British Geographers

Statistical analysis of time series: some recent developments

The igraph software package for complex network research. InterJournal Complex Systems

Conditional maximum likelihood estimation for a class of observation-driven time series models for count data

Observation-driven models for Poisson counts

Handbook of Discrete-Valued Time Series

Theory and inference for a class of nonlinear models with application to time series of counts

Stationarity and moment properties of some multivariate count autoregressions

A note on the stability of multivariate non-linear time series with an application to time series of counts

Ergodicity of observation-driven time series models and consistency of the maximum likelihood estimator

Asymptotic properties of quasi-maximum likelihood estimators in observation-driven time series models

Mixing

On weak dependence conditions for Poisson autoregressions

On random graphs I

Multivariate self-exciting jump processes with applications to financial data

Group network Hawkes process

Integer-valued GARCH process

Multivariate count time series modelling

Partial likelihood inference for time series following generalized linear models

Poisson autoregression

Multivariate count autoregression

Log-linear Poisson autoregression

Nonlinear Poisson autoregression

A primer on copulas for count data

Random graphs. The Annals of Mathematical Statistics

Martingale Limit Theory and its Application

Modelling time series count data: an autoregressive conditional Poisson model

Multivariate autoregressive modeling of time series count data using copulas

Quasi-likelihood and its Application. A General Approach to Optimal Parameter Estimation

A review of multivariate distributions for count data derived from the Poisson distribution

The NLopt nonlinear-optimization package

Regression Models for Time Series Analysis

Generalized network autoregressive processes and the GNAR package

Modelling, detrending and decorrelation of network time series

Statistical Analysis of Network Data with R

The multivariate GINAR (p) process

Asymptotic normality and parameter change test for bivariate Poisson INGARCH models

Some models for time series of counts

New Introduction to Multiple Time Series Analysis

The identification of regional forecasting models using space: time correlation functions

Generalized Linear Models

Markov Chains and Stochastic Stability

Absolute regularity and ergodicity of Poisson count processes

Estimation and prediction for stochastic blockstructures

Akaike's information criterion in generalized estimating equations

A bivariate INAR(1) process with application

On composite likelihood estimation of a multivariate INAR(1) model

Some properties of multivariate INAR(1) processes

Dynamic Nonlinear Econometric Models

Linear statistical inference and its applications

A central limit theorem and a strong mixing condition

A Matrix Handbook for Statisticians

The inversion of correlation matrix for MA (1) process

Modeling, simulation and inference for multivariate time series of counts using trawl processes

Self-excited threshold Poisson autoregression

Stochastic blockmodels for directed graphs

Social Network Analysis: Methods and Applications

An Introduction to Discrete-valued Time Series

Stationarity of count-valued and nonlinear time series models

Variance inequalities for quadratic forms with applications

A regression model for time series of counts

Longitudinal data analysis for discrete and continuous outcomes

Consistency of community detection in networks under degreecorrected stochastic block models

Network GARCH model

Grouped network vector autoregression

Network vector autoregression

Network quantile autoregression

The authors would like to thank the Editor, Associate Editor and two reviewers for valuable com-