key: cord-0209669-19jxv4m7
authors: Gonccalves, Flavio B.; Dutra, Livia M.; Silva, Roger W. C.
title: Exact and computationally efficient Bayesian inference for generalized Markov modulated Poisson processes
date: 2020-06-17
journal: nan
DOI: nan
sha: d5a611ed08dde59dc8d309a6d0d3e60356ddb044
doc_id: 209669
cord_uid: 19jxv4m7

Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives raise to a wide range of models. We present a new class of unidimensional Cox processes models in which the intensity function assumes parametric functional forms that switch among them according to a continuous-time Markov chain. A novel methodology is proposed to perform exact Bayesian inference based on MCMC algorithms. The term exact refers to the fact that no discrete time approximation is used and Monte Carlo error is the only source of inaccuracy. The reliability of the algorithms depends on a variety of specifications which are carefully addressed, resulting in a computationally efficient (in terms of computing time) algorithm and enabling its use with large datasets. Simulated and real examples are presented to illustrate the efficiency and applicability of the proposed methodology. An specific model to fit epidemic curves is proposed and used to analyzed data from Dengue Fever in Brazil and COVID-19 in some countries.

Point pattern statistical models aim at modeling the occurrence of a given event of interest in a given region, which is commonly interpreted as time in the unidimensional case. The most widely used point process model is the Poisson process (PP) in which the number of events in any region has Poisson distribution and is independent for disjoint regions. The Poisson process dynamics is mainly determined by its intensity function (IF) and is called a homogeneous Poisson process when this function is constant. Cox processes are a statistically appealing generalization of the Poisson process that allow the intensity function to vary stochastically across the region under consideration. A variety of classes of Cox process models can be defined in terms of the stochastic dynamics that describes the variation of the intensity function. Several of those models have already been proposed in the literature, including non-parametric models in which the IF is described by a Gaussian process (Møller et al., 1998; Gonçalves and Gamerman, 2018) or a diffusion process (Gonçalves et al., 2020) .

A simple yet appealing class of models are the Markov modulated Poisson processes (MMPP) in which the IF follows a continuous time Markov chain (CTMC). This means that the IF is piecewise constant with jumps having a Markovian dynamics. This class of model has been explored by different authors before with the most efficient solution to perform statistical inference proposed in Rao and Teh (2013) .

This paper proposes a generalization of Markov modulated Poisson processes -called the generalized MMPP (GMMPP), that allows the intensity function to jump among different and pre-specified functional forms. The jumps are determined by a continuous time Markov chain but in a way that each state of the chain is associated to one functional form. The model is actually specified in a way that self-jumps of the IF are allowed, meaning that the IF may restart in the same functional form. Furthermore, each functional form is allowed to depend on unknown parameters and the starting value of the IF in one functional form may vary among different visits of the chain to that state. This construction offers a quite flexible yet parametric solution to model the IF of unidimensional Cox processes.

An MCMC algorithm is developed to perform exact Bayesian inference for GMMPPs. It is exact in the sense that the devised Markov chain converges to the exact posterior distribution of all the unknown quantities of the model, including the IF. The algorithm builds upon the ideias introduced in Rao and Teh (2013) so that it scales with the number of Markov jumps and does not suffer massively when increasing the number of observations. Further non-trivial developments are proposed to circumvent the fact that a forward-filtering-backward-sampling (FFBS) cannot be used to sample one of the blocks of the Gibbs sampling as it is done in Rao and Teh (2013) . In fact, one of the ideas developed here can be used to further improve the algorithm of Rao and Teh (2013) . The proposed MCMC is computationally efficient in terms of computing time and, therefore, feasible to be used with very large datasets. This way, the main contributions of this paper are twofold: first, a novel class of parametric unidimensional Cox process which are flexible yet parsimonious is propose and, second, a computationally efficient MCMC algorithm is proposed to perform exact Bayesian inference. The proposed methodology offers an appealing (and much cheaper) alternative to non-parametric Cox processes in a variety of problems in which the latter ought to be a suitable choice.

In order to motivate the use of GMMPPs we consider two real datasets regarding coal mining disasters and exchange rate between Brazilian Real (BRL) and US Dollar (USD). For the former, each event represents an explosion that killed ten or more men in Britain. For the latter, each event represent a day in which the variation w.r.t. the previous day was greater than 1%. A kernel method (see Diggle, 1985) is used to estimate the IF as it is shown in Figure 1 and suggest that the IF ought to be well described, in both examples, by a GMMPP with an increasing and a decreasing line with varying starting values. Both examples are revisited in Section 5.

Finally, we propose a specific model to fit epidemic curves, allowing for asymmetry between the growth and decay behaviors. The model allows for simplifications in the proposed MCMC algorithm, which lead to reasonable computational times even for very large datasets -with more than 300 This paper is organized as follows. Sections 2 and 3 of the paper present the GMMPP and the proposed MCMC algorithm, respectively. Section 4 explores some simulated examples to discuss the efficiency of the proposed methodology. In particular, the proposed methodology is compared to a non-parametric Cox process approach in terms of inference and computational cost. Finally, Section 5 applies the methodology to some real datasets. Two of them are the ones presented in Figure 1 regarding coal mining disasters and the BRLxUSD exchange rate. A third example considers datasets regarding Dengue Fever epidemics in Brazil and the COVID-19 pandemic, for which a specific GMMPP model is carefully designed.

Let Y := {Y (s)} s∈R + be a non-homogeneous Poisson process (NHPP) with intensity function λ := {λ(s)} s∈R + and consider K functional forms g k , k = 1, . . . , K, to be assumed by λ along R + . These may be, for example, constants, increasing or decreasing lines, exponential functions, etc. The IF λ switches among the different functional forms according to the transitions of a continuous-time Markov chain X := {X(s)} s∈R + , with Q-matrix Q θ , initial distribution π 0 and state space {1 : E} := {1, 2, · · · , E}, for E ≤ K, where θ is the vector of parameters indexing Q θ . Furthermore, in its most general form, we allow the IF to switch from the same functional form to itself and have different starting values every time a functional form is revisited. We call the resulting process Y a generalized Markov modulated Poisson process with mathematical representation given as follows.

Define T = (T 1 , T 2 , · · · ) as the jump times of X and Z = (Z 1 , Z 2 , · · · ) as the corresponding sequence of visited states, i.e. Z i = X(T i ), i ∈ N, and Z 0 = X(0). Then,

For each s, the surjective function h s := h(X(s)) : {1 : E} → {1 : K} assigns a functional form g k to each of the states of X and δ s := δ s (X) = max

is the starting value of the IF at the last jump time up to s and, finally, ψ is a vector of parameters indexing the g k 's. For example, suppose that g 1 is a straight line with inclination β, then, for a given s such that the IF assumes the functional form g 1 at s, we have that λ(s) = R s + β(s − δ s ). Naturally, the intensity function λ is required to be non-negative. Formally, we deal with this issue by assigning zero to the density of Y conditional on any trajectory of the IF that assumes negative values. The prior on the starting values R s is presented in Section 3. In the simpler case in which the IF is not allowed to switch from each g k to itself, we set h(k) = k, k = 1, . . . , K. On the other hand if that feature is allowed, we set h(k) = h(K + k) = k, k = 1, . . . , K.

In order to favor model identifiability in a statistical context, some entries of the Q θ matrix are set to be zero so that a jump to the k-th functional form that is not a self-jump can only happen through the k-th state of X and never through the (K + k)-th one. Figure 2 illustrate the proposed class of models by presenting a realization of each of four different models. Figure 2 : Four examples of GMMPPs. I-two functional forms: constant and increasing line. No self-jumps allowed and no varying starting value. II-two functional forms: increasing and decreasing lines. Self-jumps allowed and no varying starting value. III-two functional forms: constant and increasing line. No self-jumps allowed and varying starting value. IV: two functional forms: constant and increasing line. Self-jumps allowed and varying starting value. Colour gray refers to the CTMC trajectory and the circles represent the events from the Poisson process.

We aim at performing inference for GMMPPs based on the observation of the process over a finite length time interval [0, S]. The proposed methodology is meant to be exact in the sense that no discrete time approximation of the original process is to be considered. In particular, we shall perform Bayesian inference via an MCMC algorithm that has the exact posterior distribution of all the unknown quantities in the model as its invariant distribution. As a result, Monte Carlo error is the only source of inaccuracy.

The main aim of the inference process is to obtain the posterior distribution of the intensity function and unknown parameters. Given the structure of the proposed class of models, this is equivalent to the distribution of (Z 0 , Z, T, θ, ψ|y), where y represents a realization of the process Y in [0, S]. In order to fully specify each model under the Bayesian approach we need to assign a prior distribution to parameters θ and ψ. We define ψ = (ψ 1 , . . . , ψ K ), where ψ k is the set of parameters indexing the k-th functional form, θ · = {θ k : k = 1, · · · , K} as the rates of the waiting times of X and θ k· = {θ kj : j = 1, · · · , K and j = k} as the transition probabilities from the states corresponding to the k − th functional form. The parameter vectors θ and ψ are assumed to be independent a priori. Independence among all the ψ k 's and among the components of θ · and vectors θ k· is also assumed. In the case that no self-jumps are allowed, the full prior specification of θ is completed by setting, for k = 1, . . . , K, θ k ∼ Gamma(α k , β k ), θ k· ∼ Dirichlet(γ k1 , . . . , γ kk−1 , γ kk+1 , . . . , γ kK ).

(5) In the case in which self-jumps are allowed, each row of Q θ has K non-zero probabilities due to the restrictions imposed to have model identifiability. Moreover, the transition probabilities between the two states corresponding to the same functional form are the same and the transition probabilities between the k 1 -th and k 2 -th functional forms are the same whether moving from state k 1 to k 2 or from state k 1 + K to k 2 . All this means that there are three transition probabilities associated to each functional form and the vector of these probabilities are assumed to follow a Dirichlet distribution as follows. For the k-th functional form, k = 1, . . . , K, we set θ k· ∼ Dirichlet(γ k1 , . . . , γ kK ).

Furthermore, the rate parameters of the waiting times are the same for the k-th and (k + K)-th states.

In order to illustrate all the restrictions imposed to the Q θ matrix, consider an example with three functional forms, all allowed to self-jump. The resulting Q θ matrix is then given by

Let R be the set of starting values of the intensity function at all the jump times of X in [0, S]. Note that the dimension of R is random and depends on the number of jumps. Therefore, the prior distribution on R is defined conditional on (T, Z 0 , Z), as follows:

where |T | is the number of jumps in [0, S] and R i is the starting value of the IF at T i . The prior distribution in (7)-(8) assumes a structure of conditional independence among all the starting values and identical distribution among all the starting values referring to the same functional form. Furthermore, in order to have a feasible MCMC algorithm, the constant functional form is the only one for which a continuous prior can be adopted, in particular, a Gamma distribution. For all the other forms, a discrete prior must be adopted and, unless useful information is available, we shall assume uniform discrete priors on supports chosen according to the scale of the IF. A more flexible approach is to set the size of the support of this discrete distribution and set the actual values as unknown and assuming a joint continuous prior.

The proposed GMMPP models offer a considerably flexible structure to model a variety of point process phenomena. This flexibility, however, gives rise to complex important issues that have great influence in the quality of the statistical analysis to be performed. More specifically, model and prior elicitation should be carefully performed to avoid identifiability problems and favor a reasonable model fit. Reliable prior information about the phenomenon under study should always be used for this purpose. Additional strategies may include preliminary analysis of the data and the use of informative priors. Regarding the former, one may obtain a non-parametric kernel estimate of the intensity function (see Diggle, 1985) and use this to guide the choice for the functional forms and other features (self-jump, variable starting value) of the model. The use of informative priors should be considered based on the (prior) information acquired. For example, if few transitions are expected, the data would provide little information about the parameters indexing the Q θ matrix. In this case, that information could be used to elicit informative priors for the mean waiting time parameters θ i in terms of the scale of the model (magnitude of the waiting times). The same strategy may be used to set informative priors, also in terms of the scale, for the parameters ψ indexing the adopted functional forms. Generally speaking, the proposed models do not aim at emulating non-parametric structures, which would imply the need for many functional forms with short visits to each one. This would compromise model identifiability and the computational cost. The actual aim of the proposed models is to provide good model fitting and prediction with high gains in terms of computational cost in situations in which a non-parametric structure for the IF is expendable. All of the issues discussed above are explored in the simulated and real examples presented in Sections 4 and 5.

The target posterior distribution of (Z 0 , Z, T, θ, ψ|y) is highly complex, which suggests the use of MCMC as the most reasonable choice to perform inference. Developing an efficient algorithm, however, is not straightforward and ought to consider non-trivial techniques and algorithms to achieve that goal. We propose a model augmentation approach similar to the one proposed in Rao and Teh (2013) but with some adaptations to gain in computational efficiency. The model augmentation is based on the augmented representation of a CTMC proposed in Hobolth and Stone (2009) and referred to as uniformization. The CTMC is represented as a discrete time Markov chain (DTMC) subordinated to a Poisson process. This means that the times of the DTMC, which has the same state space of the CTMC X, are defined by a Poisson process. The augmented component comes from the fact that the DTMC may have transitions between the same state. We shall refer to those type of transitions as virtual jumps. The difference between our approach and the one in Rao and Teh (2013) is that we consider a non-homogeneous Poisson process instead of homogeneous one. The gain in efficiency due to the use of a non-homogeneous PP will be made clear further ahead in the text.

Let us start by defining K constants Ω 1 , . . . , Ω K such that Ω k > |Q k |, where Q k is the k-th diagonal element of Q θ . Now let V 0 , V 1 , . . . be a sequence of discrete random variables on {1 : E} and W = (W 1 , W 2 , . . .) a sequence of random times on R + . We define the following stochastic process:

. . .

where Q j· is the j-th row of Q θ and B j· is a probability vector such that 1 j is a row vector of size E with the j-th element being 1 and all the others being 0.

As it is stated in Proposition 1 below, the process (V 0 , V ), where V = (V 1 , V 2 , . . .), subordinated to times W is an alternative representation for the CTMC X. We shall refer to this process as the augmented CTMC. Note that the virtual times are an extra component that is not defined in the original definition of a CTMC. Finally, note that the result is valid for Ω k ≥ |Q k | but, in order to use this representation in our MCMC context, we required the strict inequality. The equality implies in the almost surely non-existence of virtual jumps whilst these are crucial to establish the validity of the MCMC algorithm to be proposed, as it will be made clear further ahead in the text.

Proposition 1. For any Ω k ≥ |Q k |, the process (V 0 , V, W ) defined in (9-13) is a valid augmented representation of a continuous time Markov chain with initial distribution π 0 and Q-matrix Q θ .

Consider now the augmented model that replaces the CTMC X in the original model in (1)-(4) by the augmented CTMC define in (9)-(13). We define U and T as the virtual and non-virtual jumps of the augmented CTMC, respectively. The vector of all the unknown quantities in the augmented model is ϕ = (W, U, T, V 0 , V, θ, ψ). This means that the aim of the inference procedure is to obtain the posterior distribution of (ϕ|y). Note that there is a redundancy in the definition of ϕ since W = U T , nevertheless, that is required due to the particular sampling scheme to be adopted in the MCMC.

We design a Gibbs sampling to sample from the target posterior distribution. The blocking scheme and sampling algorithms to be adopted aim at simultaneously optimizing the convergence properties and computational cost of the Markov chain. We consider the following blocks:

Before describing the algorithms to sample from each block, we present the joint density of (Y, ϕ) which is useful to derive those algorithm since all the full conditional densities are proportional to this joint density. We have that

where each density above is obtained w.r.t. some suitable dominating measure. The likelihood π(y|V 0 , V, W, R, ψ) is written w.r.t. the probability measure of a Poisson process with constant rate such that

where N S is the number of events from y in [0, S] and t n is the time of the n-th event. The densities π(V 0 , V, T, U, W |θ), π(R|V 0 , V, T ) and π(θ) can be obtained from (9)-(13), (7)- (8) and (5)- (6), respectively. Finally, π(ψ) is some suitable continuous density.

The block (U, W, V W ) is sampled directly from its full conditional distribution. First note that, conditional on (T, V T , θ), (U, W, V W ) is independent of the data and consists of the virtual jumps. The full conditional distribution of (U, W, V W ) is given by Proposition 2 below.

Proposition 2. Defining U (i) as the virtual jumps in (T i , T i+1 ), with T 0 = 0 and T |T |+1 = S, the full conditional distribution of the virtual jumps is such that:

Note that, if Ω k = |Q k |, the number of virtual jumps is a.s. zero and, as a consequence, the MCMC chain is not irreducible since the non-virtual jumps would be restricted to the set defined by its initial value. In fact, the values of the Ω k 's have great impact on the efficiency of the algorithm. If these are increased, the mean number of virtual jumps also does which, in turn, improves the mixing of the chain. On the other hand, an increase in the number of virtual jumps leads to an increase in the computational cost of the algorithm, in particular, on the step where (V 0 , V, U, T, R) is sampled. Rao and Teh (2013) suggests the use of Ω = 2 max k |Q k | in the context of inference for MMPP, based on empirical results. Note however that the authors consider a unique Ω for all k, as presented in the original augmented CTMC representation of Hobolth and Stone (2009) . This leads to different local mixing properties of the MCMC with respect to different states (in our case, different functional forms) of the CTMC. Moreover, an optimal local choice w.r.t. the state (functional form) with the larger |Q k | ought to penalize the local computational cost associated to the other states. That issue is the main motivation for us to propose the alternative augmented CTMC with distinguished Ω k 's. It allows for a finer optimization of the chain's properties in the sense of globally optimizing the mixing without penalizing the computational cost. Finally, based on the results of Rao and Teh (2013) , we set Ω k = 2|Q k | for all k.

The density in (14) implies that

is the number of non-virtual jumps up to W l and R 0:(l) are all the starting values up to W l (which are not necessarily l values). Also,

Directly sampling from the full conditional distribution of (V 0 , V, U, T, R) requires the computation of its probability mass function, which is a (at least) E |W | -dimensional vector. Therefore, in the majority of cases, the computational cost associated to this algorithm is impractical. Furthermore, note that, for GMMPP's, each likelihood term L l (V 0:l , R 0:(l) ) depends on V and R up to time W l , because of the dependence on the last non-virtual jump up to W l . For that reason, unlike in the case of inference for MMPP's (see Rao and Teh, 2013) , a FFBS scheme cannot be devised to sample from the full conditional distribution of (V 0 , V, U, T, R). Instead, we propose an independent Metropolis Hastings (MH) step with a proposal distribution q(V 0 , V, R) that aims at approximating the target full conditional by adding suitable normalizing constant terms for each of the l terms L l (V 0:l , R 0:(l) )π(V l |V l−1 , θ) or L l (V 0:l , R 0:(l) )π(V l |V l−1 , θ)π(R (l) |V (l) ), accordingly. More specifically,

where c l (V l ) is the normalizing constant of L l (V 0:l , R 0:(l) )π(R (l) |V l ) and c l is the normalizing constant

. Note that when V l corresponds to a virtual jump, c l is the constant that normalizes π(V l |V l−1 , θ)L l (V 0:l , R 0:(l) ). The acceptance probability of the MH step is given by

where the c l 's and c * l 's refer to the current and proposal values, respectively. Note that any trajectory that leads negative values for the IF is rejected with probability 1.

As describe before, the constant functional form is the only one for which we assume a continuous prior for its starting values -a Gamma prior. For all the other forms, the required normalizing constants above would typically be intractable for continuous priors. The detailed algorithm to perform the MH step described above is presented in Algorithm 1 of Appendix B. The algorithm for the simpler case with no varying starting value is obtained by applying the straightforward simplifications.

We use the general result from Mengersen and Tweedie (1996) to establish the uniform ergodicity of the proposed MH subchain.

Proposition 3. The Metropolis-Hastings subchain defined by (16) and (17) is uniformly ergodic.

Since this is an independent MH algorithm, its efficiency relies heavily on its acceptance ratethe higher the better. Now note that this rate ought to reduce as the number of non-virtual jumps increases. For that reason, we propose an adaptation of the algorithm above that partitions the interval [0, S] and separately samples (V 0 , V, U, T, R) in each of these time intervals from its respective full conditional distribution. In order to have a robustly efficient algorithm, we propose an adapting strategy that starts by updating (V 0 , V, U, T, R) in one block and then partitions this into more blocks if required. The adaptation is considered up to a certain iteration of the Markov chain so to guarantee its convergence. Finally, the algorithm to sample (V 0 , V, U, T, R) in each subinterval of time is a direct and straightforward adaptation of Algorithm 1.

The partitioning strategy ought to be executed with care in order to guarantee that the respective full conditional distribution depends on the likelihood only inside the respective time interval. This means that the limits of the intervals have to be times T i 's of non-virtual jumps and the proposal distribution requires the restriction that preservers the upper limit of the respective time interval as a non-virtual jump time. That is achieved as follows.

Any blocking scheme based on a partition (0 = s 0 , s 1 , . . . , s B = S) must be such that times s b , b = 1, . . . , B − 1 are non-virtual jump times at the current state of (V 0 , V, U, T, R). Furthermore, the blocks are defined by the intervals [0,

The adapting partition strategy goes as follows. Set a number of iterations M large enough to obtain reliable estimations of the acceptance rate and a reasonable threshold r for the rate. The algorithm starts with one block -B = 1. Then, after every M iterations, the acceptance rate in those last M iterations is evaluated. If this rate is smaller than r, we make B = B + 1. The adaptation carries on until the computed rate is larger than r. We suggest r ≈ 0.25. Finally, a partition with B blocks is defined by setting the intervals' limits to be the T i 's which are the closest to the times |W |/B, 2|W |/B, . . . , (B − 1)|W |/B. Finally, continuous time Markov chain trajectory may be highly correlated to some parameters in ψ, which may compromise the mixing of those parameters. A simple and efficient way to mitigate this problem is to perform multiple updates of (U, W, V W ) and (V 0 , V, U, T, R) on each iteration of the Gibbs sampling. This issue is illustrate in the simulated examples.

It is straightforward to simulate from the full conditional distribution of θ given the conditional independence structure and conjugation of its prior. We have that

where m k (V 0 , V ) and τ k (V 0 , V, T ) are the total number of visits to and the total time spent at the k-th functional form in [0, S], respectively. Moreover θ k· ∼ Dirichlet(γ k1 + m k1 (V 0 , V ), . . . , γ kK + m kK (V 0 , V )), with self-jumps, θ k· ∼ Dirichlet(. . . , γ kk−1 + m kk−1 (V 0 , V ), γ kk+1 + m kk+1 (V 0 , V ), . . .), without self-jumps;

where m k 1 k 2 (V 0 , V ) is the total number of transitions from functional k 1 to functional form k 2 in [0, S].

Concerning parameters ψ, we have from (14) that

The prior independence among the ψ k 's implies in the conditional independence of the respective K full conditionals. Moreover, for a constant g k with fixed starting value, a Gamma(η k , ν k ) leads to a full conditional

where n k (y) is the number of events from y occurring during the time that the IF assumes the functional form g k . For all the other functional forms we perform MH steps with an adapted random walk proposal (see Roberts and Rosenthal, 2009 , Section 2) for each vector ψ k . The acceptance probability for each ψ k is given by

where ψ k and ψ * k are the current and proposal values, respectively, and π(ψ k |·) is proportional to the product of the likelihood in (15) and the prior density of ψ k .

Prediction is a common procedure associated to the statistical analysis of stochastic processes. In the context of unidimensional Poisson processes, prediction consists in estimating the future behavior of the process, in particular, its intensity function and/or events. The Bayesian approach allows prediction to be made under a probabilistic approach through the predictive distribution. Consider the full Bayesian model of a GMMPP Y in [0, ∞] and let y be a realization of the process in [0, S]. Now define g(Y, V, T, ψ) to be some measurable function, in the probability space of the full Bayesian model, that depends on (Y, V, T ) only in (S, ∞). Then, prediction about g(Y, V, T, ψ) is made through the predictive distribution of g(Y, V, T, ψ)|y.

In a MCMC context, it is straightforward to obtain a Monte Carlo (MC) sample from the predictive distribution as long as it is feasible to simulate from the full model. A MC sample is obtained by simulating g(Y, V, T ) conditional on each value simulated along the MCMC (after a burn-in period) due to the fact that π(g(Y, V, T, ψ)|y) = π(g(Y, V, T, ψ)|ϕ, y)π(ϕ|y)dϕ.

Appealing examples of g(Y, V, T, ψ) include:

For examples i. and ii., it is enough to simulate the CTMC X conditional on each sample of (X S , θ) and compute g for the respective sampled value of ψ. For example iii., an extra step is required to simulate from a P oisson(ΛṠ) distribution, conditional on each simulated valued of ΛṠ.

This section presents a collection of simulated examples to explore important issues related to the methodology proposed in this paper. In particular, we explore: 1. the impact of the number of observations and the number of jumps in the IF on the computational cost of the MCMC algorithm; 2. a sensitivity analysis for the priors of ψ and θ; 3. the efficiency in estimation and prediction (with replications). Convergence diagnostics are obtained based on the MCMC chain for the parameters and for the posterior log-density. Computational cost is evaluated in terms of the average time (in seconds) to obtain 100 effective samples of the posterior log-density. The effective sample size of an MCMC sample of size n is defined as n ess = n 1+2 ∞ j=1 ρ j , where ρ j is the autocorrelation of order j of the chain. It is such that the variance of the ergodic average of the n values from the chain is the same as the variance of the ergodic average of an independent sample (from the target distribution) of size n ess . For the two examples in which the GMMPP is compared to a non-parametric IF model, we consider the effective samples of the log-likelihood instead of the posterior log-density.

All the examples are implemented in Ox (Doornik, 2009 ) and run on an i7 3.4GHz processor with 16MB RAM. Codes are available upon request to the authors.

In this section we investigate the computational cost associated to the proposed methodology. In particular, we investigate the impact of the number of observations and the number of changes in the IF trajectory. As it has been emphasized before, the low computational cost is at the core of the main contributions of this paper.

We simulate five scenarios with the same behavior for the IF (functional forms and changes) in the same time interval but with different levels of magnitude. We consider three functional forms -increasing and decreasing lines and a constant, with fixed starting values and no self jumps allowed. Table 1 presents the specific functional forms, length of stay, mean number and actual number of observations. We fix the Q-matrix so that all the states have mean staying time of 20 units and uniformly distributed transition probabilities. Multiple updates of blocks (U, W, V W ) and (V 0 , V, U, T, R) are performed to control the high autocorrelation of the parameters of the increasing line functional form -5 updates for scenarios A1 and A2, 15 for A3 and A4 and 25 for A5. Moderately informative priors are adopted for ψ in scenario A1, namely ψ 11 ∼ N (1, 2 2 ) (intercept of the increasing line), ψ 12 ∼ N (0.5, 0.6 2 ) (slope of the increasing line), ψ 21 ∼ N (5, 2 2 ) (intercept of the decreasing line), ψ 22 ∼ N (−0.5, 0.6 2 ) (slope of the decreasing line), ψ 31 ∼ Gamma(1, 1) (constant). For all the other scenarios, independent uniform improper priors are used for all parameters but ψ 31 , for which a Gamma(1, 1) is also used.

Results regarding the estimation of the IF and of the ψ parameters are presented in Figure 9 and Table 6 in Appendix C. They show a reasonably good recovery of the IF and parameters already for the dataset with only 103 observations with the estimation improving substantially with the size of the dataset.

The relation between the computational cost and the number of observations is shown in Figure  3 . We highlight the computational efficiency of the proposed MCMC algorithm shown by the running times. The methodology has shown to be quite efficient to be applied for very large datasets. For example, the total running time to obtain an effective sample size of 100 for the log-posterior density is around 2.3 minutes for the dataset with 10 thousand observations and 18 minutes for the dataset with 30 thousand observations. 

We simulate three scenarios with the same average number of observations and three functional forms -increasing and decreasing lines and a constant, with fixed starting values and self jumps allowed. The IF is simulated from the same CTMC prior but considering different total observed time in order to have considerably different numbers of changes in the IF. The number of observations is approximately 2000 for all the scenarios. Table 2 presents the specific functional forms, average length of stay per visit and number of changes in the IF. The priors on the Q-matrix diagonal parameters are θ 1 ∼ Gamma(1, 10), θ 2 ∼ Gamma(1, 10) and θ 3 ∼ Gamma(1, 5) and, for the transition probability vectors, we adopt a uniform prior on the respective simplex. Finally, uniform improper priors are adopted for all the ψ parameters. Blocks (U, W, V W ) and (V 0 , V, U, T, R) are updated 5 times in each iteration of the Gibbs sampling. Results regarding the estimation of the IF and of the ψ and θ parameters are presented in Figure  10 and Table 7 in Appendix C. They show a very good recovery of the IF and parameters.

The relation between the computational cost and the real number of changes in the IF is shown in Figure 4 . Again, we highlight the computational efficiency of the proposed MCMC algorithm shown by the running times. The total running time to obtain an effective sample size of 100 for the logposterior density is around 105 seconds for the dataset with 40 changes in the IF and approximately 2 thousand observations.

We perform a prior sensitivity analysis for parameters ψ for scenarios A1, A3, A5 and B2. Those examples are run with non-informative and moderately informative priors. The latter are set based on the scale of each example. Also, a prior sensitive analysis for parameters θ in the diagonal of the Q-matrix is performed for scenarios B1 and B3. Again, non-informative and moderately informative priors are used.

In the first analysis, the Q-matrix is fixed for all the A * scenarios in the same values as in Section 4.1. For scenario B2, the same non-informative priors from Section 4.1 are adopted. The prior on the constant IF parameter ψ 31 is set to be Gamma(1, 1) in all the cases. For the parameters indexing the other two functional forms, we compare the results for improper uniform priors and the moderately informative priors shown in Table 3 . Results for the parameters estimation are shown in Table 8 in Appendix C and show that greater differences are observed only for the parameters of the increasing line. As it should be expected, the variances of those parameters are greater for the non-informative priors, for which the posterior density is also more asymmetric. Results for the IF go in the same direction, with significant diferences observed only for scenario A1 in the time period associated to the increasing line. It can be noticed that the posterior distribution of the IF is more influenced by the data for the non-informative prior, as expected -see Figure 14 in Appendix C. The second sensitivity analysis concerns the prior distribution on the parameters θ in the diagonal of the Q-matrix. Improper uniform priors are adopted for all the ψ parameters except for the constant value ψ 31 which has a Gamma(1, 1) non-informative prior. Uniform priors on the simplex are adopted for all the transition probability vectors in the Q-matrix. The non-informative priors for the Qmatrix diagonal parameters are improper uniforms distributions and the informative ones are θ 1 ∼ Gamma(1, 10), θ 2 ∼ Gamma(1, 10) and θ 3 ∼ Gamma(1, 5). Results (omitted here) are virtually the same for the two prior specifications w.r.t. the estimated IF, ψ parameters and transition probability vectors from the Q-matrix. As for the parameters in the diagonal of the Q-matrix, small yet nonnegligible differences are observed, with slightly larger variances for the non-informative priors case.

We now investigate the efficiency of the proposed methodology in terms of estimation and prediction by considering replications of the same model. We consider the IF from scenarios A1, A3 and B2 and generate 50 independent datasets for each one. Prediction for the integrated IF in interval [400, 800] is performed for scenario B2 by sampling from its predictive distribution.

In order to sumarize the performance of the proposed model we consider the posterior distribution of the following measure of fit:

where λ R is the real intensity function. Results are shown in Figures 11, 12 and 13 and reveal a very good performance of the proposed methodology to estimate and predict the IF.

We compare the proposed class of models to that proposed in Gonçalves and Gamerman (2018) , in which the IF is assumed to be a continuous positive function of a latent Gaussian process. The computational cost associated to the MCMC algorithm from Gonçalves and Gamerman (2018) is O((λ sup S) 3 ), where λ sup is the supremum of the IF in [0, S]. It is defined by the cost to generate multivariate normal distributions which are required due to the use of a latent Gaussian process. This implies that not only the cost is larger and grows much faster than the cost from our methodology but also that it is not feasible to apply the methodology to very large datasets.

We consider a dataset of size 302 generated from the IF λ(s) = 20 exp{−x/5} + 1.5 exp{−(x − 25) 2 /50} in [0, 50] and excluding the 8 observations generated in [9, 14] . The GMMPP is fit with decreasing and increasing lines and a constant. The estimated IF for both models are shown in Figure  15 in Appendix C and show that similar results are obtained for both models. The computational time per 100 effective samples of the log-likelihood is 15.32 seconds for the GMMPP and 2 hours for the non-parametric IF model (with no approximations to simulate from the Gaussian process).

We apply the proposed methodology to the classic coal mining disasters data of Jarrett (1979) , consisting of the dates of 191 explosions in coal mines that killed ten or more men in Britain between 15th March 1851 and 22th March 1962 (rescaled to [0, 112] , year unit). We also analyze data with the non-parametric IF model of Gonçalves and Gamerman (2018) .

Based on an empirical analysis of the data (see Figure 1 ), we set two functional forms -a decreasing and an increasing line with varying starting values. We adopt the following priors: Uniform(0.1, 0.3, . . . , 4.7) for both the varying starting values, improper uniform for both the slopes, Gamma(1, 20) for all the diagonal parameters of the Q-matrix and uniform priors are used for the transition probabilities.

The computational time per 100 effective samples of the log-likelihood is 2.93 minutes for the GMMPP and 8.6 minutes for the non-parametric IF model (with no approximations to simulate from the Gaussian process). The estimated IF for both models is shown in Figure 5 . The posterior mean and standard deviation of the integrated IF is 197.6 and 13.8 for the GMMPP and 193.4 and 14 .5 for the non-parametric IF model. The mean and standard deviation of the slopes are -0.0406 and 0.0388 for the negative one and 0.1620 and 0.3709 for the positive one. The same statistics for the mean waiting times are 11.70 and 4.18 for the decreasing line and 9.36 and 3.16 for the increasing line. Figure 5 : Estimated IF (posterior mean and 95% CI) for the coal mining data for the GMMPP (red) and non-parametric IF (blue) models.

We consider the exchange rate between US Dollar to Brazilian Real. The dataset consists of the 1163 days, between Jan 2000 and Dec 2017 (rescaled to [0, 216] , month unit), in which the exchange rate varied more than 1%. Prediction is performed for the period of Jan 2018 to Apr 2020.

Based on an empirical analysis of the data (see Figure 1 ), we set two functional forms -a decreasing and an increasing lines, both with varying starting values. We adopt the following priors: Uniform(5, 5.2, . . . , 12) and Uniform(0, 0.2, . . . , 6) for the varying starting values of the decreasing and increasing lines, respectively, and uniform improper priors for both the slopes. A Gamma(1, 40) is assumed for the diagonal parameters of the Q-matrix and uniform priors are used for the transition probabilities. The observed time interval is divided into 6 blocks to update the CTMC component.

The MCMC algorithm takes around 4.6 minutes to draw 100 effective samples. The estimated IF is shown in Figure 6 . The posterior mean and standard deviation of the integrated IF is 1143.1 and 34.3. The mean and standard deviation of the slopes are -0.3102 and 0.0651 for the negative one and 0.1176 and 0.0355 for the positive one. The same statistics for the mean waiting times are 11.70 and 4.18 for the decreasing line and 9.36 and 3.16 for the increasing line.

The predictive distribution of the integrated intensity between Jan 2018 to Apr 2020 is shown in Figure 16 in Appendix C and has mean 151.4, standard deviation 35.80 and 95% CI (78.44, 218.29) . The real observed number of events is 151.

We consider a model which we believe to be of practical use to model epidemic phenomena. The idea is to model each cycle of the IF to have an exponencial growth, some period of stabilization and then an exponential decay. Moreover, in order to mimic the expected behavior of epidemic curves, we need the exponential growth and decay rates to change over time. This behavior can be emulated by using a cdf, in particular the standard normal cdf. The parametrization of the model is such that the model Figure 6 : Estimated IF (posterior mean and 95% CI) for the exchange rate data.

is flexible and parameters have a clear interpretation. The idea is to model each cycle of the epidemic phenomena after the IF starts to decrease. This means that the IF is known to start in the increasing functional form and than having one change to the decreasing one and, therefore, prediction would concern its future decreasing behavior. The model is the following:

where g 1 and g 2 are the increasing and decreasing curves, respectively, and γ(ψ, T 1 ) is set to be (b 1 + aΦ(d 1 + c 1 T 1 ) − b 2 )/Φ(d 2 ) to guarantee the continuity of the IF at the change time.

We impose some restrictions on the parameter space so to ease model identifiability and parameter interpretation. We set c 1 > 0, c 2 > 0, a > 0, b 1 ≥ 0 and b 2 ≥ 0 as standard identifiability restrictions. We also set the less obvious restriction d 2 < 3 so that the time period of constant behavior of the IF is majorally accommodated by the end of the increasing function g 1 and, consequently, identifiability of the change point is favored. Also, note that in order to estimate the stabilization level b 2 the data needs to include the stabilization period, otherwise, this parameter should be fixed (ex. at zero).

The restrictions above lead to a clear interpretation of the model's parameters as follows.

• b 1 : identifies the initial value of the IF -typically around b 1 ;

• a: defines the maximum value assumed by the IF -typically ≈ b 1 + a;

• d 1 : defines the initial growth rate of the IF curve;

• c 1 : defines the rate in which the growth curve changes and the maximum growth rate;

• d 2 : defines the initial decay rate of the IF curve;

• c 2 : defines the rate in which the decay curve changes and the maximum decay rate;

• b 2 : defines the stabilization level after the epidemic period.

Furthermore, the maximum slope of the growth and decay curves are given byċ 1 = a √ 2π c 1 anḋ c 2 = −γ(ψ,T 1 ) √ 2π c 2 , respectively. In order to improve the mixing of the MCMC algorithm by reducing the correlation among parameters, we reparametrize the model in terms of (ċ 1 ,ċ 2 ) instead of (c 1 , c 2 ). This implies that

We highlight the fact that the Bayesian approach and the variance of the Poisson process conditioned on its IF provide considerable flexibility and suitable uncertainty quantification to model epidemic curves, especially when compared to deterministic models directly applied to the number of events.

Finally, this model can be extended to have more flexible curves by considering the cdf of other distributions such as student-t, skew-normal and skew-t. This can account, for example, for skewed growth and decay curves and for cases in which the epidemic curve decays faster than it grows up to a certain time but then takes longer to stabilize, suggesting the use of a heavy tail cdf to model the decay.

The model in (19)-(21) has features that allow for some improvements in the MCMC from Section 3.2. It is now possible to sample directly from the full conditional distribution of the block (V 0 , V, U, T, R) at a reasonable computational cost. That is because the condition of having only one change in the IF is imposed and, therefore, the size of the state space of this discrete full conditional distribution is |W |. This also allows us to increase the value of Ω k and, consequently, improve the mixing of the chain, without compromising the cost. One may consider, for example, Ω k = 5|Q k |.

Another strategy to boost computational efficiency is to truncate the change time to be inside a suitable interval, based on the empirical IF. This interval is conservatively chosen so that it is certain that the change occurs inside it. In order to analyze the data, we distribute the cases uniformly in their respective week and use day as the scale unit. The total number of cases in that period was 30700 in Ceara and 331411 in Parana. The epidemic curve in Parana is relatively close to stabilization, but still decaying, so we also predict the time until stabilization -when the IF hits 110. Data is available in the InfoDengue system (Codeco et al., 2018) .

For the Ceara data, we were compelled to restrict parameter a to be in the interval (0, 250) -safely higher than the maximum of the empirical IF, in order to avoid the growth curve to be fit by only (around) half of the cdf. This avoids identifiability and computational problems. For parameter d 2 we set priors Uniform(−∞, 3) for Ceara and Uniform(0, 3) for Parana. Uniform improper priors are adopted for all the other parameters. The diagonal values of the Q-matrix are fixed at (1/120, 1/250), for Ceara, and (1/180, 1/90), for Parana. Results are shown in Figure 7 and Table 4 .

We also analyze data from the Covid-19 pandemic in Switzerland and Romania. Whilst the epidemic curve has already stabilized for the former, it is still decaying for the latter. For that reason, we predict the time until stabilization -when the IF hits 20, for Romenia. The dataset for Switzerland concerns the 30845 cases notified from Feb 25th (date of the first notification) to May 30th -96 days. The (Demarqui and Santos, 2020) . For the Switzerland data, we restrict parameter a to be in the interval (0, 1300) for the same reasons we restrict that parameter for the Ceara Dengue Fever data. For parameter d 2 we set priors Uniform(−∞, 3) for Switzerland and Uniform(0, 3) for Romania. Uniform improper priors are adopted for all the other parameters. The diagonal values of the Q-matrix are fixed at (1/30, 1/60), for Switzerland, and (1/50, 1/40), for Romania. Results are shown in Figure 8 and Table 5 . 

This paper proposed a novel class of unidimensional Cox processes in which the intensity function assumes predefined functional forms and alternates among these according to the jumps of a continuous time Markov chain. This novel class aims at providing an efficient way to perform useful statistical analysis of unidimensional point processes at a very reasonable computational cost, specially when compared to non-parametric approaches based on latent Gaussian processes. Important issues regarding model elicitation and identifiability and some aspects of the MCMC algorithm are discussed and explored in simulated studies. Model elicitation should be based on prior knowledge and/or empirical analyzes of the data. Whilst non-informative priors work well for the parameters indexing the functional forms, prior elicitation for the parameters in the Q-matrix requires special attention. If not many changes are expected, parameters in the diagonal should be fixed at values coherent with the scale in a way to avoid very short visits. For the transition probabilities, uniform priors are suitable in any case.

The proposed MCMC algorithm performs exact Bayesian inference for the proposed model, so that only Monte Carlo error is involved. The algorithm is carefully devised to efficiently sample from the posterior distribution of all the unknown quantities in the model. In particular, the blocking scheme to sample from the CTMC trajectory has shown to be crucial to obtain a computationally efficient algorithm. Simulated studies illustrated the computational and statistical efficiency of the proposed methodology under different circumstances. In particular, efficient solutions for large datasets are obtained at a reasonable cost.

A particular model to analyze epidemic data is proposed so that asymmetric epidemic curves can be properly accommodated. This model is used to fit datasets regarding Dengue fever in Brazil and COVID-19 in some countries. Results are quite interesting and include the prediction for the curves which have not yet stabilized. The applicability of the methodology for large datasets is illustrated in those examples -one of them have over 300 thousand observations. Other two real datasets are also analyzed to illustrate the applicability of the proposed methodology. Prediction is performed for one of them, providing good results. Finally, results indicate that, typically, models with only straight lines (increasing, decreasing and constant) are enough to provide a good fit.

Clearly,

where I S := I(T |T |+1 > S) so that P (I S = 1|T |T | ) = e −Q V |T | (S−T |T | ) . The result above establishes part i. of the proposition. We now obtain the full conditional density of U (i) , i = 0, . . . , |T | − 1, w.r.t. the measure of a unit rate Poisson process. We have that

where |U i | is the number of virtual jumps in [T i , T i+1 ). This establishes part ii. of the proposition for i = 0, . . . , |T | − 1. For i = |T |, we have that

, which concludes the proof.

In order to establish uniform ergodicity for an independent MH chain, it is enough to show that the ratio q π , where π is the target density, is uniformly bounded away from zero on the support of π (Mengersen and Tweedie, 1996) . Note that 0.066 0.07 0.03 (0.03,0.13) (θ 11 , θ 12 , θ 13 ) (1/3,1/3,1/3) (0.12,0.46,0.42) (0.10,0.14,0.14) (θ 21 , θ 22 , θ 23 ) (1/3,1/3,1/3) (0.53,0.27,0.20) (0.17,0.14,0.14) (θ 31 , θ 32 ) (0.5,0.5) (0.67,0.33) (0.18,0.18) 

Infodengue: A nowcasting system for the surveillance of arboviruses in brazil

A kernel method for smoothing point process data

An object-oriented matrix programming language ox 6

Exact Bayesian inference in spatio-temporal Cox processes driven by multivariate Gaussian processes

Exact Bayesian inference for diffusion driven Cox processes

Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution

A note on the intervals between coal-mining disasters

Rates of convergence of the hastings and metropolis algorithms

Log Gaussian Cox processes

Fast mcmc sampling for markov jump processes and extensions

Examples of adaptive mcmc

The fact that c l > 0

The first author would like to thank FAPEMIG and CNPq for financial support. The second author would like to thank CAPES for financial support. The authors would like to thank Fabio Demarqui for helping in obtaining the COVID19 data and Leonardo Bastos for helping in obtaining the Dengue Fever data.

Proof of Proposition 1Let W (1) be the first non-virtual jump and V (1) the state of V at W (1) . The density of (W (1) |V 0 ) with respect to the Lebesgue measure is

Furthermore,and analogous calculations establish the required result for (V (m) |V (m−1) ).

Algorithm 1 MH step for (V 0 , V, R)Compute c = (c 0 , · · · , c |W | ) from (16) for the current values of (V 0 , V, R)..Sample V * l ∼ Multinomial(1, p). If V l−1 = V l , sample R ( l) * from the density or probability vector c l (V l )L l (V 0:l , R 0:(l) )π(R (l) |V l ).

Sample u ∼ Uniform(0, 1).if u < 1 ∧ |W | j=1 c j c * j then return (V * 0 , V * , · · · , R * ). else return (V 0 , V, R).