key: cord-0735779-drx90fcz authors: Franco-Villoria, Maria; Ventrucci, Massimo; Rue, Haavard title: Variance partitioning in spatio-temporal disease mapping models date: 2021-09-27 journal: Statistical methods in medical research DOI: 10.1177/09622802221099642 sha: 146716d8d36780f541a0ca127cc1958490ca624b doc_id: 735779 cord_uid: drx90fcz Bayesian disease mapping, yet if undeniably useful to describe variation in risk over time and space, comes with the hurdle of prior elicitation on hard-to-interpret random effect precision parameters. We introduce a reparametrized version of the popular spatio-temporal interaction models, based on Kronecker product intrinsic Gaussian Markov Random Fields, that we name the variance partitioning (VP) model. The VP model includes a mixing parameter that balances the contribution of the main and interaction effects to the total (generalized) variance and enhances interpretability. The use of a penalized complexity prior on the mixing parameter aids in coding prior information in a intuitive way. We illustrate the advantages of the VP model using two case studies. The Covid-19 pandemic has put the world at stake. In Italy, the first two cases were confirmed on 31 st January 2020 and on 9 th March 2020 a national lockdown was put in place by the authorities to control and reduce the expansion of the virus. Data on newly infected people have been routinely collected since then to monitor the evolution of the disease. The study of the pandemic evolution can be tackled using disease mapping. Knowledge of how the infection has spread can help to evaluate the performance of containment measures. In particular, the quantification of the spacetime interaction, which describes how the spatial patterns change over time, has been proposed as a way to deepen our understanding on the evolution of the disease (Diggle et al., 1995) . Disease mapping models (Martinez-Beneito and Botella-Rocamora, 2019; Lawson, 2018; Mac-Nab, 2022; Wakefield, 2007) aim to describe the variation in risk of a particular disease over space and time. Data are usually available in the form of aggregrated counts at some spatial level, such as counties, municipalities, etc. Additive time and space models have been long used to model disease rates (Knorr-Held and Besag, 1998) . More recently, the availability of complex data has made it possible to consider more complex models that include an interaction term to appropriately capture space-time relationships in the data (see for example Abellan et al. (2008) , Knorr-Held (2000) , Waller et al. (1997) , Bernardinelli et al. (1995) to cite a few). Understanding the spatial distribution of disease risk or how it has evolved over time might be useful for public health authorities in planning resource allocation and identification of areas to be prioritized. In particular, the space-time interaction may reveal important information regarding the nature of the disease, for example suggesting whether a new disease is possibly infectious (Aldstadt, 2007) or the existence of additional causes in non-infectious cases (Robertson et al., 2010) . Thus a model that is able to quantify the importance of this term is desirable from a practical point of view; in this paper we introduce a model parametrization that partitions the total variance into main and interaction effects so that the contribution of each of those can be quantified. Crude disease rates are unrealiable due to sampling variability so smoothing is used to borrow information across neighbouring areas and time points. For this reason, disease mapping has been developed mainly in a Bayesian hierarchical model formulation where the building blocks of a smooth in one and more dimensions can be modelled using intrinsic Gaussian Markov Random Fields (IGMRF) such as the first and second order random walk (Lindgren and Rue, 2008) or the ICAR (Besag, 1974) models. For modelling interactions statisticians have used tensor products smoothers, where, in a Bayesian framework, the penalty can be seen as a special type of GMRF called Kronecker product GMRFs (Rue and Held, 2005) . In Bayesian spatio-temporal disease mapping the precision parameter of the IGMRFs plays a role in controlling the degree of smoothing applied over time and space. A number of issues related to prior elicitation need to be addressed when dealing with intrinsic models. Firstly, the precision matrix is singular, which means that the total variance that we aim to partition is not finite. In order to define priors on the variance components we can rely upon the concept of generalized variance of an IGMRF; this has been defined by Sørbye and Rue (2014) as the geometric mean of the diagonal elements of the generalized inverse of the precision matrix of the IGMRF, and can only be computed upon linear constraints. A second issue to bear in mind is that the generalized variance of an IGMRF depends on the structure matrix, and hence it changes depending on things like the temporal and spatial resolution or the size of the dataset at hand. This means that interpretation of the precision parameter becomes case-dependent, making prior elicitation and paremeter interpretation difficult. To avoid this problem, Sørbye and Rue (2014) advise scaling the structure matrix so that the generalized variance is equal to 1; this way the precision parameter is automatically rescaled and the prior has the same meaning regardless of the graph structure (Freni-Sterrantino et al., 2018) . Scaling becomes particularly relevant in the context of space-time models, as otherwise differences in the structure matrices of the spatial, temporal and spatio-temporal terms would have an impact on the priors for the corresponding precision parameters that we cannnot control. By scaling the structure matrix of the temporal and spatial random effects, the structure matrix of the interaction, defined as a Kronecker IGMRF, is automatically scaled. Further to the issues mentioned above, the choice of priors for variance parameters has received much attention in the literature (Gelman, 2006; Wakefield, 2007; Fong et al., 2010; Simpson et al., 2017) . Part of the hassle in choosing a prior stems from the difficulty of interpreting variance parameters, especially for intrinsic processes, where the standard deviation is to be interpreted as a conditional one (Fong et al., 2010; Riebler et al., 2016) . On top of that, in models with various terms, the tendency is to set priors independently for each precision parameter, while some authors are beginning to recognize that it might be more practical to think about total variability and how each term in the model contributes to that rather than to concentrate on single variance components separately (Wakefield, 2007; Riebler et al., 2016; Fuglstad et al., 2020; Ventrucci et al., 2020) . In the context of disease mapping, Wakefield (2007) proposes using an inverse Gamma prior on the total variability, along with a Beta prior that distributes the variance between a spatially correlated random field and a spatially unstructured effect (the so called BYM model Besag et al. (1991) ). Using a similar parametrization, Riebler et al. (2016) present a prior that shrinks towards no spatial effect following the penalized complexity (PC) prior approach of Simpson et al. (2017) . Outside the disease mapping literature, Ventrucci et al. (2020) develop a PC prior in one-factor mixed models for the relative contribution of group-specific variability. In a more general context, Fuglstad et al. (2020) introduce a framework for hierarchically distributing the variance in additive models, where, at each level of the total variance decomposition, ignorance or preference about the variance contribution of a term is expressed via a Dirichlet or a PC prior, respectively. We add to the literature by considering also the temporal dimension in disease mapping models. In particular, all the terms in the model (main effects and interaction) are assumed to follow intrinsic models. This differentiates our work from the literature mentioned above. In this work, we revisit the spatio-temporal models proposed by Knorr-Held (2000) , where the space-time interaction term can be one of four different types, depending on the degree of dependence assumed between time and space. These four types are characterized by different prior assumptions, expressed in terms of a Kronecker product. We propose an intuitive reparametrization that leads to partitioning the generalized variance between the main effects and interaction. The main and interaction effects are not independent, and hence using a joint prior on those terms is preferable. We do so by including a mixing parameter that 1) easies interpretation and 2) naturally leads to a prior that is intuitive to elicit. One of the advantages of the Bayesian framework is that whenever information on the disease process is available, it can be encoded into the prior (Robertson et al., 2010) . Often, the epidemiologist might have an intuition on how important the interaction term is in explaining the spatio-temporal variation of a particular disease. However, translating this information in terms of a precision parameter is not trivial at all. We follow the penalized complexity prior (PC) framework of Simpson et al. (2017) to derive a prior for the mixing parameter that avoids overfitting by construction and allows the user to code any prior information easily. This way we alleviate both problems, by considering an interaction model that not only enhances interpretability but also permits a more intuitive construction of the prior. We call this reparametrized version the variance partitioning (VP) model. The proposed methodology is applicable to any of the four space-time interactions described in Knorr-Held (2000) . The rest of the paper is organized as follows. Section 2 covers spatio-temporal disease mapping models, with a particular emphasis on the space-time interaction framework by Knorr-Held (2000) , followed by a brief discussion of priors for variance parameters with special attention to the PC prior approach. In Section 3 the VP model is described in detail and the PC prior for the mixing parameter is presented, while the technical details are relegated to the supplementary material. Section 4 illustrates the proposed model on two case studies, a well known example in the disease mapping literature and an Italian Covid-19 dataset. The paper closes with a discussion in Section 5. Consider data on n 1 time points and n 2 non-overlapping areas, y ij is the observed number of cases at time i = 1, . . . , n 1 and area j = 1, . . . , n 2 . The most commonly used models for y ij are the binomial and the Poisson; in either case, the model in the linear predictor scale can be written as η ij = α + f 1 (i) + f 2 (j) + f 12 (i, j), where f 1 (i) and f 2 (j) represent the main temporal and spatial effects respectively and the function f 12 (i, j) captures the space-time interaction. The model can be parametrized with random effects as where β 1 = (β 1,1 , . . . , β 1,n 1 ) T and β 2 = (β 2,1 , . . . , β 2,n 2 ) T are vectors of random effects describing the temporal and spatial main effect, respectively, and δ = {δ ij }, i = 1, . . . , n 1 , j = 1, . . . , n 2 is the vectorized spatio-temporal interaction term. The random effects β 1 , β 2 and δ are typically assumed as smooth processes modelled using intrinsic Gaussian Markov Random Fields (IGMRF, Rue and Held (2005) ), a special type of improper GMRF, defined below. Appropriate constraints (Goicoa et al., 2018) need to be imposed to ensure identifiability of the terms in (1). The constraints on the interaction term are summarized on Table 1 , while on the temporal and spatial main effects it is enough to impose a sum to zero constraint. As usual, any available covariates can be included in model (1) as fixed effects. Definition 1 (Improper GMRF). Let Q be an n × n symmetric positive semi-definite (SPSD) matrix with rank n − p > 0. Then x = (x 1 , . . . , x n ) T is an improper GMRF of rank n − p with parameters (µ, Q) if its density is where |Q| * is the generalized determinant of the precision matrix Q. Improper GMRFs are used as smoothing priors in structured additive regression (STAR) models, a flexible class including generalized linear mixed models, temporally dynamic models, spatial varying coefficient models, etc; for an account of STAR models see Fahrmeir et al. (2013) and references therein. Following Rue and Held (2005) we define an IGMRF of order 1 as an improper GMRF where Q1 = 0, i.e., the precision matrix is singular with null space spanned by a column vector of ones, 1 n of length n. Popular examples of an IGMRF of order 1 are the first order random walk (RW1), which is a possible option to model the temporal main effect β 1 , and the intrinsic conditional autoregressive (ICAR) model by Besag (1974) , which is often assumed in disease mapping to model the spatial effect β 2 when smoothing across neighbouring regions is required. An IGMRF of order 2 is an improper GMRF whose precision matrix is singular and its null space is spanned by a constant vector 1 n and a linear vector (1, . . . , n) T . A popular example is the second order random walk (RW2; Lindgren and Rue (2008) ), popularly used for modelling smooth covariate effects in STAR models, and often implemented in spatio-temporal disease mapping for modelling the main temporal effect β 1 when smoothness in the disease risk over time is anticipated. All the IGMRFs described above have in common that their precision matrix can be written as Q = τ R, where τ is a precision parameter and R is a known structure matrix that encodes the dependence structure. In particular, for the RW1 where notation k ∼ l indicates contiguous time points. For the ICAR, the structure matrix is given by where m k is the number of neighbours for region k and notation k ∼ l indicates contiguous areas that share a common border. The structure matrix of a RW2 can be written as R = D T D where D is a second order difference matrix of dimension (n − 2) × n. It is common in the disease mapping literature to consider one or both main effects f 1 and f 2 as a sum of structured and unstructured effects, so that model (1) becomes where 1 ∼ N (0, τ 1 I n 1 ), 2 ∼ N (0, τ 2 I n 2 ). Typically, a RW1 or RW2 model is assumed for the temporal effect β 1 ∼ N 0, τ −1 1 R − 1 and an ICAR is assumed for the spatial effect β 2 ∼ N 0, τ −1 2 R − 2 , where notation M − indicates the generalized inverse of matrix M . The combination of the structured and unstructured spatial terms β 2 j + 2 j is commonly known as the BYM model (Besag et al., 1991) . We describe now the interaction term δ in Eq. (2). Smoothness is induced by assuming which is a Kronecker product IGMRF with precision Q = τ 12 R I , i.e. an improper GMRF with precision given by the Kronecker product of two IGMRFs. These models are used for smoothing spatial and spatio-temporal data, and they are the Bayesian equivalent of tensor product spline models (Wahba, 1978) . Knorr-Held (2000) envisions four different types of interactions, reported in Table 1 . Interaction type I can be seen as unstructured variation due to unobserved covariates, while interaction types II and III allow for the temporal trend to change from location to location and the spatial trend to change over time, respectively, but in an independent manner. Interaction type IV is the most complex one, assuming that the temporal trend changes with location in a spatially dependent way, or equivalently, that the way in which the spatial trend changes over time is time-dependent. Knorr-Held (2000) . The IGMRF on the interaction parameter vector δ has structure R I given by a Kronecker product; r 1 = 1 or 2 depending on the order of the RW assumed for the time effect. type R I rank(R I ) linear constraints on δ I I n 2 ⊗ I n 1 n 1 n 2 not needed II Model (1) includes different precision parameters τ 1 and τ 2 for smoothing over time and space and an additional one, τ 12 , controlling the variance of the interaction term, which yields a model able to capture the smooth spatio-temporal structure underlying the data with high flexibility. However, these models have limitations in terms of interpretation of the results, as precision parameters are not informative about the total variance explained by the associated components and the priors are not easy to elicit (see Section 2.2). We propose an alternative parametrization to address these issues in Section 3. There are two main challenges in prior choice for the precision parameters in model (1). The first problem regards the so called scaling issue that affects IGMRFs in general; Sørbye and Rue (2014) proposed addressing this issue by scaling the precision structure R so that the geometric mean of the diagonal elements in R − is 1. In this way, the prior for τ will roughly encode the same degree of complexity across different types of structures and hence will have the same interpretation. Of particular interest is the spatial case where, after scaling the precision of the ICAR, the prior for the precision parameter becomes transferable across different applications using different graph structures. The second challenge regards the structure of the Kronecker product IGMRF, which can be thought of as an extra layer of flexibility on top of the main effects model. The common practice is to set independent priors on each precision parameter, but this totally disregards the model structure. Popular choices are Gamma for τ , or half-t and uniform on the standard deviation 1/ √ τ (Gelman, 2006) . The Gamma prior has repeatedly been pointed out as a poor choice often made by convenience; among the reasons why it should be avoided is that it forces overfitting or underfitting depending on the choice of its parameters Wagner, 2010, 2011; Fong et al., 2010; Simpson et al., 2017; Ventrucci and Rue, 2016) . In Section 3 we propose a novel modelling framework where the interaction is seen as a flexible extension of the main effects model, and the prior is set so that the interaction term shrinks to the main effects following the PC prior framework. Recently, PC priors have been proposed as a way to prevent overfitting, based on four simple principles, that we briefly summarize and illustrate below for the precision parameter τ of a Gaussian random effect. For further details the reader is referred to Simpson et al. (2017) . Let π 1 denote the density of a model component w with precision parameter τ . This model component can be seen as a flexible extension of a based model with density π 0 and τ = ∞ (i.e. absence of random effects). The four principles are: 1. Parsimony: The prior for τ should give proper shrinkage to τ = ∞ and decay with increasing complexity of π 1 , so that the simplest model is favoured unless there is evidence for a more flexible one. 2. The increased complexity of π 1 with respect to π 0 is measured using the Kullback-Leibler divergence (KLD, Kullback and Leibler, 1951) , For ease of interpretation, the KLD is transformed to a unidirectional distance measure that can be interpreted as the distance from the flexible model π 1 to the base model π 0 . 3. The PC prior is defined as an exponential distribution on the distance, with rate λ > 0. The PC prior for τ follows by a change of variable transformation, leading in this case to a type-2 Gumbel distribution with parameters (1/2, λ): 4. The parameter λ in (3) can be selected by the user based on his prior knowledge of τ (or an interpretable transformation of it such as the standard deviation). This can be expressed in an intuitive way with a probability statement, e.g. setting U and a such that P(1/ √ τ > U ) = a, so that λ = − log(a)/U . Knowledge on the marginal standard deviation can aid in choosing a sensible value for U ; Simpson et al. (2017) provide a practical rule of thumb: once the precision τ is integrated out, the marginal standard deviation of the random effect for a = 0.01 is about 0.31U . We present below the VP model assuming model (1), but everything applies straightforwardly to model (2) as well; details about the VP version of model (2) can be found in Section 4. From model (1) it is hard to quantify the relative contribution of the main and interaction components to the total variance, because the involved precision parameters are not interpretable in terms of the variance explained by the associated components. Our proposal is to reparametrize model (1) as a weighted sum of two IGMRFs representing the main and interaction components by means of a mixing parameter γ ∈ [0, 1]. We include a further mixing parameter φ ∈ [0, 1] to distribute the variance between the temporal and spatial main effects. Assume model (1), the reparametrized version of the linear predictor is where τ > 0 is an overall precision parameter, 0 < γ < 1, 0 < φ < 1. We consider a RW1 or a RW2 prior on the temporal main effect β 1 and an ICAR prior on the spatial main effect β 2 as specified in Section 2. Note that, differently from model (1), the precision structuresR 1 ,R 2 have been scaled according to Sørbye and Rue (2014) . The interaction term δ is modelled as a Kronecker product IGMRF; following Knorr-Held (2000) we consider interaction types I, II, III, and IV as described in Table 1 . Model (4) includes the same vectors of random effects as model (1), but in contrast to model (1), we now have very intuitive hyperparameters: τ is the total precision, i.e. τ −1 is the total generalized variance, and γ and φ are two interpretable mixing parameters. The value of γ can be interpreted as the proportion of total variance explained by the interaction δ. The variance explained by the main effects is therefore given by τ −1 (1 − γ): 1 − φ quantifies the proportion of such variance which can be attributed to the temporal random effects β 1 , with φ being the proportion attributed to the spatial random effects β 2 . We need to assign priors to the overall precision parameter τ and the mixing parameters γ and φ. In the next section we focus on the prior for γ, and leave prior choice for the remaining parameters to Section 4. Our choice of a PC prior for γ follows naturally from the model reparametrization in Eq. (4) and provides a way of eliciting the prior in a very intuitive way. Furthermore, it avoids overfitting by construction hence guaranteeing a parismonious model. Our PC prior for γ (see Result 1 below) is based on the assumption that the interaction model in (4) shrinks to the main effects model (β 1 +β 2 ). Result 1. Let us assume a model of the form (4), for all types of interaction in Table 1: 1. The distance from the base model is The proof can be found in Supplemental material A.1-A.3. The scaling of the PC prior for γ, i.e. the choice of θ in Eq. (5), is done by defining the probability of a tail event on γ. The parameter θ controls the strength of penalisation for deviating from the base model; the higher the θ the greater the penalty. We suggest setting U and a such that P(γ < U ) = a; this way θ is obtained by numerically solving: Note that it is not possible to assign equal weight to the main and interaction terms in the model, i.e. U = a = 0.5 because of the constraint a > √ U . However, we can always encode a fair amount of uncertainty into the prior by choosing a close to 1 and large values of U . In the left panel of Figure 1 , θ is obtained using a = 0.99 and three different values for U . A large U allows for more flexibility as the corresponding density curve decreases steadly towards zero as γ increases, while for a small value of U the density curve drops towards zero quite sharply, strongly penalizing any deviation from the base model. For comparison, the right panel in Figure1 shows the prior on γ that corresponds to using a Gamma prior on all three precision parameters in model (1) for three different parameter choices. The figure illustrates how the resulting prior on γ depends strongly on the chosen values for the Gamma parameters, going from one extreme to the other in terms of prior weight on the base model. Results from a simulation study reported in Supplemental material B indicate that the posterior mean estimates of γ are reasonably close to the true value under different scenarios. We have observed stable results for several choices of U , unless one defines on purpose an unflexible prior, where most of the probability mass is placed near the base model (e.g. when adopting a = 0.99 and a small U = 0.05). Results are comparable to those obtained using a Uniform prior on γ unless there is no interaction (i.e. γ = 0), in which case the uniform leads to greater bias when the population at risk is small. As introduced in Section 2, model (2) is more common in practice and indeed it is the model adopted in this section for both case-studies. In the case of structured and unstructured main effects, another set of parameters ψ 1 and ψ 2 can be included to further distribute the variance, so that model (4) becomes: where and β 1 , β 2 and δ as in model (4). Result 1 about the PC for γ still holds; see Supplemental material A.3. Note that the parameters in model (6) are identifiable as the model is just a reparametrized version of the classic space-time interaction model (2), where each random effect has its corresponding precision parameter. The number of parameters is exactly the same; in fact, it can be shown that there is a one-to-one mapping between the parameters of both versions of the model. As in model (1), appropriate constraints need to be imposed to ensure identifiability of the terms in (6). The constraints on the interaction term are summarized on Table 1 , while on the temporal and spatial structured main effects it is enough to impose a sum to zero constraint. In the next two examples, we use the PC prior in Eq. (3) for τ and the PC prior in Eq. (5) for γ. Regarding φ, ψ 1 and ψ 2 , we simply choose a uniform on (0,1) as a prior for each of them, but other choices are possible. In fact, a PC prior could also be used for φ following the work by Fuglstad et al. (2020) , who also consider the use of a Dirichlet prior where the base model attributes equal weights to each component, thus expressing ignorance about how the variance is distributed. Similarly, one could use a PC prior on each ψ 1 and ψ 2 as in Riebler et al. (2016) , considering as base model the absence of structured effects. All the VP models presented in the next two examples were run using R-INLA (Rue et al., 2009) , see code in Supplemental material D. We illustrate our model using the Ohio lung cancer data (Knorr-Held, 2000; Knorr-Held and Besag, 1998; Waller et al., 1997) which is available at http://www.biostat.umn.edu/˜brad/ data2.html. These data report yearly counts of lung cancer deaths for white males from 1968 to 1988, in the 88 counties of Ohio. Figure 2 left panel displays the time series of mortality rate for all counties. Our aim is not to find the best model for this data, but to show what our approach can add in terms of interpretability of the results compared to a classical analysis as performed in Knorr-Held (2000) . Let y ij be the number of deaths at time i = 1, . . . , 21 in county j = 1, . . . , 88 and pop j be the population at risk in county j, we consider the model proposed in Knorr-Held (2000) assuming structured and unstructured effects for both space and time main effects, plus a space-time interaction term. The classical parameterization in Knorr-Held (2000) follows, where the main effects are modelled as: where R 1 and R 2 are the unscaled structure matrices of a RW1 (for time) and an ICAR (for space). The space-time interaction is modelled by a Kronecker product IGMRF built on the precision matrices of the structured components β 1 and β 2 . All the interaction types in Table 1 are considered in the following analysis. This model would require priors for the precision hyperparameters τ 1 , τ 2 , τ 1 , τ 2 , τ 12 . Instead of working with the above model, we assume the VP model in Eq. (6), with scaled structure matrices, with the priors for τ, γ, φ, ψ 1 , ψ 2 stated in Section 4. For τ 's PC prior we set a = 0.01 and U = 1/0.31 following the rule of thumb described in Section 2.2. The PC prior for γ is scaled by imposing U = 0.5, a = 0.99; results (not shown here) were stable for varying U = {0.05, 0.5, 0.95}. Table 2 reports various model selection criteria for the VP model, for interaction types I, II, III and IV, namely DIC (Spiegelhalter et al., 2002) , WAIC (Watanabe, 2013) , leave-one-out log score (LOOLS), computed as − n i=1 log π(y i |y −i ), and the log-marginal likelihood (logMLIK), π(y|M), which quantifies the likelihood of the data y under a given model M. PC priors enhance the marginal likelihood as a simple and effective tool for fair model comparison, when the compared models have similar structure and only differ on a particular component (Sørbye and Rue, 2018; Ventrucci et al., 2020) . Assume M 1 and M 2 are the interaction type I and II models, respectively: these models are the same except for a different type of interaction. The Bayes factor (Kass and Raftery, 1995 The scale parameter θ of the PC prior for γ, which controls the decay rate from the base model (the model with no interaction), has to be chosen for both M 1 and M 2 . We can handle this choice conveniently by setting the same θ for M 1 and M 2 , which implies that π(M 2 )/π(M 1 ) in Eq.(8) cancels out and the Bayes factor turns out to be the ratio of the posterior odds. For all the four interaction models in Table 2 we follow this strategy and set the same decay rate for the PC prior on γ. The advantage is that the a priori contribution of the interaction to the total (generalized) variance is the same, no matter what interaction type is assumed; this is a desirable feature when having to choose among different types of interaction models the one that best fits the data. Therefore, we suggest comparing the logMLIK values for model choice purposes here; furthermore, DIC is known to favour complexity in models with many random effects (Riebler and Held, 2010) . From Table 2 we see that DIC, WAIC and LOOLS point to type II (followed by type IV) as the best model; a similar conclusion based on DIC was found in Knorr-Held (2000) . Interestingly, logMLIK is largest for type I which indicates that the model with main effects plus an individuallevel random effects capturing unstructured variation may be a better description of the Ohio data. In order to show now the gain of using our approach compared to a classical analysis we start by discussing some plots obtained for type I interaction about the main effects. The top right panel in Figure 2 displays the estimated main temporal effect, in the scale of the linear predictor, decomposed into its structured and unstructured (iid) components. The unstructured effects looks very flat compared to the structured ones which is probably responsible for most of the temporal variation in the relative risk. The relative risk increases roughly linearly in time, with a less steep increase towards the end of the time window. The bottom panels in Figure 2 display the estimated structured (left) and iid (right) spatial effects in the scale of the linear predictor. Here the unstructured effect shows larger variability than the structured one which shows a very smooth spatial gradient from north-west to south-east. A visual inspection of this sort, also possible when the classical model is used, gives useful insights into the spatial and temporal patterns in the data. However, it does not allow proper quantification of the variance attributable to the various sources (main, interaction, spatial and temporal effects, etc), while this quantification is readily available from our VP model. Table 3 reports the mean (with 2.5 and 97.5 quantiles between brackets) of the posterior distribution of the mixing parameters γ, φ, ψ 1 , ψ 2 , from which we can understand quantitatively the contribution of the main sources of variation. In particular, rows 1 and 2 report the total variance partitioned into interaction versus main effects; rows 3 and 4 quantify how the variance attributable to the main effects is partitioned into space and time; rows 5 to 6 and 7 to 8 give the variance partitioning for the structured versus iid for space and time, respectively. The main findings about the variation of the spatio-temporal mortality risk pattern are as follows. First, the estimated contribution of the interaction is about 4.8%, which means that the main effects are responsible for most of the variability in mortality risk with the interaction playing a minor role in describing this data. This is reasonable for non-infectious diseases such a cancer (Abellan et al., 2008) . Second, space is responsible for about 87.5% of the variation in risk explained by the main effects, which is hard to grasp from only looking at the plot of the main temporal and spatial effects in Figure 2 . This result highlights the fact that lung cancer in Ohio had, in the period of time considered, larger variability over space than time which could be informative for policy makers and epidemiologists and may contribute to generate hypothesis on the role played by possible environmental risk factors in the region. Third, within the main spatial and temporal effects, the structured component is predominant for time, while the iid component is predominant for space. However, these estimated contributions, and in particular the latter, are affected by greater uncertainty than the previous estimates, indicating that the data are less informative about the posterior for ψ 1 and ψ 2 than they are for γ and φ. These findings are stable across the different types of interactions (see Supplemental material C). We use the VP model to study Covid-19 incidence variations across space and time in Italy. Data cover all of the 107 Italian provinces and span a period of time that goes from the onset of the pandemic on 24 th February 2020 to late July 2021 for a total of 70 weeks; the full dataset is made available by the Italian National Institute of Health through the website https://github.com/ pcm-dpc/COVID-19. Data are originally available on a daily basis, but we aggregate them by week to smooth out artefactual patterns mainly due to delays in reporting new cases. The final dataset consists of weekly counts of new Covid-19 cases y ij , for week i = 1, ..70 and province j = 1, . . . , 107, and the population at risk for each province pop j . Our goal is to analyze the sources of variation in Covid-19 incidence rates in a scale between 0 and 1, which is easy to interpret and visualize and provides a clear idea of the contribution of each source. We follow the ideas in Picado et al. (2007) in considering the interaction term as a measure of local heterogeneity, which can be seen as an indirect measure of how effective the control measures are. Hence a primary interest is to quantify the contribution of the interaction to the total variability, i.e. the posterior estimate for γ. Our second interest is to investigate changes in the estimated local heterogeneity across geographical macro-regions and time windows. We run two analysis: in the first one we fit the VP to the full dataset, in the second one we run the same VP model to separate subsets of the data which are constructed using combinations of geographical area, with levels north (N), centre (C) and south (S), and pandemic wave, with levels W1 and W2. The first wave (W1) covers the first 18 weeks and roughly indicates the national lock down period, while the second wave (W2) covers the rest of the time frame and indicates the period where restriction measures were set at a regional level. Data are displayed in Figure 3 . We consider the binomial model in Eq. (7), where structured and unstructured random effects are specified for both space and time as main effects. We model the temporally structured effects as a RW1 (as we do not anticipate smoothness) and the spatially structured effects as an ICAR, and assume a type IV space-time interaction to capture potential complex space-time patterns which are not explained by the main space and time components. In this particular example, the spatial main effect may reflect differences on the public health policy strategies adopted in each area (for example different testing rates across provinces). Again, we avoid the classic parametrization and take advantage of the VP approach described in (6). Doing so, we can elicit the prior easily and describe the various sources of variability in the data in an intuitive way in terms of the mixing parameters γ, φ, ψ 1 , ψ 2 . Available information on the nature of the disease can be used to aid in parameter choice for the PC priors on γ and τ . Since we know that we are dealing with a contagious disease which evolves over time possibly in a different manner across provinces we anticipate a relevant contribution of the interaction term. Thus we choose θ in Eq. (5) by setting U = 0.95, a = 0.99, which implies a large probability that γ < 0.95. In choosing the scale parameter λ of the PC prior for τ in Eq. (3) we consider the scale of the logit transformed incidence rates and use the rule of thumb described in Simpson et al. (2017) , imposing a marginal standard deviation equal to 2 for the incidence rates in the linear predictor (logit) scale. The left panel in Figure 4 reports the variance partitioning plot for the full Covid-19 dataset; this plot is just a graphical version of the variance partitioning table that was presented in Table 3 for the Ohio lung cancer data. This plot resembles the graphs in Gelman (2005) that summarize anova results in terms of estimated standard deviation for each bunch of random effects in the model. Our variance partitioning plot follows the same idea but represents the contribution of each source in a scale (0, 1). The main effects acount for the greatest proportion of the total variation. Within the main effects, the variability in incidence rates is mostly driven by the spatial component, in particular by the unstructured part of it (although the corresponding posterior estimates are highly uncertain), while for the temporal part is the structured component that explains most of the variability. The middle and right panels in Figure 4 report the variance partitioning plot for the models fitted to different subsets of the full dataset to investigate whether the spatio-temporal pattern in Covid-19 cases is consistent or not across geographical areas (N, C, S) and pandemic waves (W1, W2). It is interesting to see that the impact of the interaction term is greater in the second wave than in the first one for all three areas, suggesting greater local heterogeneity during the second wave. This could reflect the fact that restricition measures went from being national in the first wave to being regional in the second one, so we expect greater heterogeneity over space during the latter. Within the first wave, the main effects are responsible for a greater proportion of variation in all three areas, but that attributable to the interaction is slightly greater in the South, followed by the North and then the Centre. In this paper, we revisit spatio-temporal disease mapping, with particular attention to the interaction models discussed in Knorr-Held (2000) , and propose a new model that allows variance partitioning among the main effects and the space-time interaction. When defining priors on the hyperparameters that control complexity of each intrinsic GMRF component, it is important to bear in mind that the main effects belong to the null space of the interaction term. This means that the interaction can naturally be regarded as an extension of the model including the main effects alone. This idea leads to a model reparametrization where a mixing parameter γ balances out the contribution of the main and interaction effects to the total variance. The proposed approach implicitly defines a joint prior on the precision parameters of the various terms in the classic parametrization of the model. The advantages of this reparametrization are twofold; on the one hand, prior choice can be made in an intuitive manner using a PC prior, avoiding the issue of eliciting priors on hard-to-interpret precision parameters. In space-time disease mapping, the nature of the disease can provide useful information to elicit the prior; for example, for non-infectious diseases such as the one considered in the first case study most of the variation is expected to be explained by the main effects (Abellan et al., 2008) . This knowledge can be easily passed onto the PC prior for the mixing parameter γ, while coding this information into a precision parameter in the classic parametrization would be far from easy. On the other hand, the posterior for γ becomes a useful tool to investigate variations in disease risk on a very practical scale and can provide useful insights into epidemiological interpretations. We have illustrated the use of the VP model in two examples; the variance partitioning tables and plots summarize the contribution of the different sources of variation in terms of proportion of explained (generalized) variance. In a broader perspective, our work falls within the framework of variance distributing models as introduced by Fuglstad et al. (2020) , and adds to the literature in considering intrinsic GMRF models. The variance partitioning approach proposed here may be adopted in all those applications where intrinsic GMRFs are meant as tools to perform smoothing in more than one dimension; for instance in the analysis of grid-data such as those arising from agricultural field trials or spatio-temporal data from environmental studies and ecological surveys. For ease of presentation, we first prove Result 1 for model (4), Section 3 of our paper, type IV interaction in Appendix A.1 and then show that it is also valid for types I, II and III in Appendix A.2. Details regarding the proof for the model including unstuctured and structured main effects, model (6) Section 4 of our paper, can be found in Appendix A.3. Throughout the proof, we assume a RW2 model on the temporal random effect and an ICAR on the spatial one. The modification of the proof when a RW1 model is used on the temporal effect is straightforward. Model (1) in our paper can be written in general form (in the linear predictor scale) as where τ > 0 is the precision parameter, 0 < γ < 1 is the mixing parameter, ω 0 , ω 1 are ndimensional IGMRFs with precision matrices Q 0 and Q 1 respectively, with whereR 1 andR 2 are the scaled structure matrices of a RW2 and an ICAR, respectively. Note that rank(R 1 ) = n 1 − 1 and rank(R 2 ) = n 2 − 2, so it follows that rank(Q 1 ) = n 1 n 2 − n 2 − 2n 1 + 2 and rank(Q 0 ) = n 1 + n 2 − 3. For ease of presentation, we simplify the notation and denote n = n 1 n 2 , r = 2n 1 + n 2 − 2, so that rank(Q 1 ) = n − r. It is immediate to see that rank of Q 0 is smaller than the rank deficiency of Q 1 , i.e.: n 1 + n 2 − 3 ≤ 2n 1 + n 2 − 2 ⇔ n 1 ≤ 2n 1 + 1, so that rank(Q 0 ) = r − l, where l ≥ 0 is the difference between rank(Q 0 ) and r. For ease of presentation, we can assume l = 0 (note that if l = 0 then the adjustment of the proof is straightforward). Consider τ = 1 without loss of generality. To derive the PC prior for γ we will study the limiting behaviour of KLD(π 1 ||π 0 ) for γ = γ 0 → 0 under the base model. The distributions π 1 and π 0 are defined as follows: The KLD is given by: Expression (10) can be computed easily if we consider the eigendecomposition of the matrices V Q 0 = [e 1 , e 2 , . . . , e r , e r+1 , . . . , e n ] ; V Q 1 = [ê 1 , . . . ,ê r ,ê r+1 , . . . ,ê n ]. where Λ Q 0 , Λ Q 1 represent the diagonal matrix of eigenvalues and V Q 0 and V Q 1 the matrices whose columns are the associated eigenvectors. A common eigenvector basis V can be formed as V = [e 1 , e 2 , . . . , e r ,ê r+1 , . . . ,ê n ], If l = 0 then there would be a set of eigenvectors that are associated to zero eigenvalues in both matrices Q 0 and Q 1 contemporarily, so the common basis can still be formed. Matrices Σ − 0 and Σ 1 can be re-expressed as Q 0 and Λ −1 Q 1 are diagonal matrices with elements λ i andλ i . Note that Q 0 and Q 1 are singular; following Simpson et al. (2017) First, we compute trace(Σ −1 0 Σ 1 ), for which we need the diagonal diag(Σ −1 0 Σ 1 ). Let us define we can re-express the diagonal as The trace simplifies to (note that if l = 0, then we would sum over all indices i = r − l + j for j = 1, . . . , l). Second, we compute log |Σ 1 | |Σ 0 | in (10): It results: Below we compute the term α(γ, γ 0 ) i for i = 1, . . . , r and i = r + 1, . . . , n: • i = 1, . . . , r (λ i = 0): Note that the eigenvalues of Q 0 and Q 1 turn out to be irrelevant for computing the KLD, as they cancel out in the α(γ, γ 0 ) i terms above. Finally, the KLD is: For γ 0 → 0 and γ 0 γ < 1 the dominant term in expression (15) is (n − r) γ γ 0 . Therefore, the distance from the base model, measured as d for a constant c > 0 that does not depend on γ. Since 0 ≤ d(γ) ≤ c, assigning a truncated exponential with rate λ on d(γ) we have Applying a change of variable and reparametrizing θ = λc leads to the PC prior for γ: which completes the proof. From Appendix A.1, it is clear that the proof works provided that a common eigenbasis can be found for matrices Q 0 (which is the same as in Appendix A.1) and Q 1 (that changes depending on the type of interaction). We first illustrate that this is case for interaction types I, II and III, to then show that the KLD remains the unchanged. Interaction type I For the type I interaction, Q 1 = I n 2 ⊗ I n 1 so it has a single eigenvalue equal to 1 with multiplicity n 1 n 2 . Given that any vector of R n 1 n 2 is an eigenvector of Q 1 , it is enough to use the eigenvectors from the eigendecomposition of Q 0 as a common eigenbasis. Interaction type II For the type II interaction, Q 1 = I n 2 ⊗R 1 has 2n 2 eigenvectors associated to null eigenvalues, and n 2 (n 1 −2) eigenvectors associated to non-null eigenvalues, that come from the tensor product of nonnull eigenvectors from the matrices I n 2 and R 1 . Let e R 1 1 , . . . , e R 1 n 1 −2 be the eigenvectors associated to non-null eigenvalues of R 1 ; the first n 1 − 2 eigenvectors associated to non-null eigenvalues of the matrix Q 0 are: while the first n 1 − 2 eigenvectors associated to non-null eigenvalues of the matrix Q 1 are: where e 1 is the first eigenvector of the identity matrix I n 2 . We can eigen decompose the identity matrix using the eigenbasis for R 2 , so that e 1 = 1 n 2 ; this guarantees that a common matrix of eigenvectors V can be found. In particular, it would be formed of the n 1 + n 2 − 3 non null eigenvectors from Q 0 and the n 1 n 2 −2n 2 non null eigenvectors from Q 1 . Note that these two collection of vectors will have n 1 − n 2 − 3 vectors in common from the eigenvectors in (16) if n 1 > n 2 + 3. In the type III interaction, Q 1 =R 2 ⊗ I n 1 has n 1 eigenvectors associated to null eigenvalues and n 1 n 2 − n 1 eigenvectors with non-null eigenvalues. In particular, let e R 2 1 , . . . , e R 2 n 2 −1 be the eigenvectors associated to non-null eigenvalues of R 2 ; the following are n 2 − 1 eigenvectors associated to non-null eigenvalues of the matrix Q 0 : e R 2 1 ⊗ 1 n 1 , . . . , e R 2 n 2 −1 ⊗ 1 n 1 while for matrix Q 1 we find the following n 2 − 1 eigenvectors associated to non-null eigenvalues : where e 1 is the first eigenvector of the identity matrix I n 1 . Similarly to the type II interaction, we can use the eigenbasis for R 1 to eigen decompose I n 1 so that a common eigenbasis can be found. It would be formed of the n 1 + n 2 − 3 non null eigenvectors from Q 0 and the n 1 n 2 − n 1 non-null eigenvectors from Q 1 . Note that these two collection of vectors will have n 2 − 3 vectors in common from the eigenvectors in (17) if n 2 > 3. Regarding the KLD, which is calculated based on the eigenvalues of Q 0 and Q 1 , whenever the rank of Q 0 is not smaller than the rank defficiency of Q 1 , there will be a number of pairs of eigenvalues that are not zero contemporarily. This number is equal to n 1 + n 2 − 3 in the type I, n 1 − n 2 − 3 in the type II and n 2 − 3 in the type III interaction. Nevertherless, the contribution of the corresponding term α(γ, γ 0 ) i in the KLD is minimal and the dominant term when γ 0 → 0 remains the same as shown in Appendix A.1 for the type IV interaction, so the PC prior does not change. In the case of structured and unstructured main effects, matrix Q − 0 : and rank(Q 0 ) ≤ n 1 + n 2 . Interaction type IV Following the proof in Appendix A.1, it is enough to show that rank(Q 0 ) ≤ 2n 2 + n 1 − 2. Given that the rank of Q 0 is at most n 1 + n 2 , the rank condition is true provided that 0 ≤ n 2 − 2, i.e. that there are at least 2 spatial locations, which is always true in practice. Interaction types I,II, III For interaction types I, II and III it is still possible to find a common eigenbasis, as adding a constant to the diagonal of a matrix does not change its eigenvectors. The eigenvalues do change though, so now the number of eigenvalues that are not zero contemporarily in Q 0 and Q 1 (whenever the rank of Q 0 is not smaller than the rank defficiency of Q 1 ) are n 1 + n 2 − 1 for type I, n 1 − n 2 − 1 for type II and n 2 − 1 for type III, and the dominant term in the KLD remains the same as before. We run a simulation study to investigate the performance of the VP model when using the PC prior for γ proposed in Section 3.1 Eq. (5) in our paper. We generate datasets based on the space and time patterns estimated from the Covid-19 data described in Section 4.2 in our paper; to limit the computational burden we select a subset of the the full dataset (north provinces, wave 1) with n 1 = 17 weeks and n 2 = 47 provinces. Assume i and j are indices for weeks and provinces, respectively, we simulate data as where pop j is the population in province j, µ ij the Covid-19 incidence rate at week i in province j. The vectorsβ 1 = (β 1,1 , . . . ,β 1,n 1 ) T ,β 2 = (β 2,1 , . . . ,β 2,n 2 ) T andδ = {δ ij }, i = 1, . . . , n 1 , j = 1, . . . , n 2 contain the posterior means for, time, space and space-time random effects, respectively. These estimates come from the VP model (Eq. (4) in Section 3 in our paper) fitted to the Covid-19 data, north provinces wave 1, by assuming a type IV interaction (see the top panels in Figure 5 for the time and space main effects). We further assume τ = 1, φ = 0.5 and keep them fixed throughout the simulation study, while letting the mixing parameter γ vary, in order to create different scenarios according to the contribution of the interaction to the total (generalized) variance. Our goals are: 1) to check how well the true γ is recovered when estimated using our VP model (Eq. (4), Section 3 in our paper) -where, as an estimator for γ we take the posterior mean; 2) to assess sensitivity to the choice of θ, the scaling parameter for the PC prior on γ. The following scenarios are considered regarding the contribution of the interaction to the total variance: • SC1: γ = 0 (additive model, no interaction); • SC2: γ = 1/10 (low interaction); • SC3: γ = 1/3 (moderate interaction); • SC4: γ = 2/3 (strong interaction). Scenario SC1 (γ = 0) assumes an additive model where the time pattern remains the same across provinces. Scenarios SC2 and SC3 represent cases of, respectively, low and moderate interaction. SC4 is intended as a limiting case where the interaction between space and time main effects is very strong; as we can see from Figure 5 (bottom right panel), where one simulated dataset under SC4 is displayed, the temporal pattern can vary substantially across provinces and some of them show a decreasing trend at the beginning of the first wave period, which is clearly unrealistic for Covid-19 disease. We consider different scenarios by letting the number of trials of the Binomial model, pop j , j = 1, . . . , n 2 , vary. The following three sample size (i.e. population at risk) levels are considered: • Actual sample size: the population in province j is taken as pop j ; • Smaller sample size: the population in province j is taken as pop j /10; • Larger sample size: the population in province j is taken as pop j · 10. The second scenario represents a smaller sample size case, where the data carries less information about γ thus we expect less accuracy in the model estimates; analogously, the third scenario represents a case where data are more informative about γ, hence we expect the model to provide improved estimates in this case. We simulated 100 datasets under SC1, SC2, SC3 and SC4, for each of the three different sample size levels described above. The VP model (Eq. (4), Section 3 in our paper) was fitted to each dataset assuming a RW1 as the time main effect, an ICAR as the space main effect and a type IV space-time interaction. All the computations were done using R-INLA. The VP model was fitted under 4 different prior choices for γ: • prior 1: PC(U = 0.05, a = 0.99); • prior 2: PC(U = 0.5, a = 0.99); • prior 3: PC(U = 0.95, a = 0.99); • prior 4: Uniform(0, 1). The first three priors consider the different scalings of the PC prior displayed in Figure 1 of our paper. This way, robustness of the results for changing U can be tested. The values U = {0.05, 0.5, 0.95} reflect, respectively, an unflexible, moderate and flexible prior on the space-time interaction random effects. We also estimated the model using a uniform prior for γ. Regarding τ and φ, we assigned a Gumbel type 2 PC prior on τ and a Uniform(0, 1) on φ to express ignorance about the variance contribution of space (and time). We considered two different scalings of the PC prior on τ (U = 2/0.31 and U = 100/0.31) but they did not make any difference on posterior estimates. Figures 6 to 9 report the boxplots of the posterior mean of γ obtained by fitting the VP model to the 100 simulated datasets under the four scenarios SC1, SC2, SC3 and SC4. The horizontal dashed line represents the true γ set by simulation in each scenario. Each figure has three panels that refer to actual (left), smaller (central) and larger (right) sample size cases. The four boxplots in each panel correspond to different priors on γ: the PC priors with scalings U = {0.05, 0.5, 0.95} and the Uniform prior. Regarding SC1 (Figure 6) , where the true γ is zero, we see that the Uniform prior implies a larger bias than the PC prior choices do, which is presumably due to the Uniform being prone to overfitting. This behaviour is more evident in the small sample size case, as a result of the data being less informative about the proportion of variance explained by the interaction. In scenarios where the true γ > 0 (i.e. SC2, SC3 and SC4) we generally observe a negative bias under all prior choices, however the bias is smaller as the sample size increases. Regarding the first aim of the study, i.e. checking the ability to recover the true γ set by simulation, we can conclude that estimation of γ is reasonable in all cases. We would like to emphasize that while the bias achieved under the uniform prior is always slightly smaller than the bias obtained by the PC priors in scenarios SC2, SC3 and SC4, it becomes much larger in SC1 because of the tendency to overfitting of the uniform. This highlights the fundamental advantage of PC priors which avoid overfitting by default as they shrink to the base model γ = 0 by construction. Regarding our second aim, i.e. studying sensitivity of the results to the choice of θ, we notice that as long as the unflexible choice of U = 0.05 is avoided, the mixing γ is estimated fairly well using the moderate and flexible choices, U = 0.5 or 0.95. In particular, U = 0.5 or U = 0.95 return comparable estimates of the mixing parameter γ under all scenarios. From these results, we suggest that in absence of strong prior information on γ the choice of a PC prior with U = 0.95, a = 0.99 is a reasonable weakly informative prior on γ that allows flexibility and at the same time avoids model overfitting. As regards estimation of φ and τ , results (not reported here) show that the true values φ = 0.5 and τ = 1 are accurately estimated in all scenarios by all priors. Figure 12 : Variance partitioning plots for the Covid-19 example; the analysis here refer to different subset of the data for each combination of the factors geographical area, with levels north (N), centre (C) and south (S), and pandemic wave, with levels W1 and W2. Use of space-time models to investigate the stability of patterns of disease An incremental Knox test for the determination of the serial interval between successive cases of an infectious disease. Stochastic Environmental Research and Risk Assessment Bayesian analysis of space-time variation in disease risk Spatial interaction and the statistical analysis of lattice systems (with discussion) Bayesian image restoration, with two applications in spatial statistics Second-order analysis of space-time clustering Regression: models, methods and applications Bayesian inference for generalized linear mixed models A note on intrinsic conditional autoregressive models for disconnected graphs Stochastic model specification search for Gaussian and partial non-Gaussian state space models Bayesian variable selection for random intercept modeling of Gaussian and non-Gaussian data Intuitive joint priors for variance parameters Analysis of variance-why it is more important than ever Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) In spatio-temporal disease mapping models, identifiability constraints affect PQL and INLA results Bayes factors Bayesian modelling of inseparable space-time variation in disease risk Modelling risk from a disease in time and space On information and sufficiency Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology On the second-order random walk model for irregular locations Bayesian disease mapping: Past, present, and future Disease Mapping: From Foundations to Multidimensional Modeling Space-time interaction as an indicator of local spread during the 2001 FMD outbreak in the UK The analysis of heterogeneous time trends in multivariate ageperiod-cohort models An intuitive Bayesian spatial model for disease mapping that accounts for scaling Review of methods for space-time disease surveillance Gaussian Markov Random Fields Approximate Bayesian inference for latent Gaussian models using inte-grated nested Laplace approximations (with discussion) Penalising model component complexity: A principled, practical approach to constructing priors Scaling intrinsic Gaussian Markov random field priors in spatial modelling Fractional gaussian noise: Prior specification and model comparison Bayesian measures of model complexity and fit (with discussion) Pc priors for residual correlation parameters in one-factor mixed models Penalized complexity priors for degrees of freedom in bayesian p-splines Improper priors, spline smoothing and the problem of guarding against model errors in regression Disease mapping and spatial regression with count data Hierarchical spatio-temporal mapping of disease rates A Widely Applicable Bayesian Information Criterion Note that the model can be estimated using the usual inla call; the R package inlaVP was written to aid the user in setting the interaction type, building the constraints and defining the joint prior. The R package inlaVP is not on CRAN yet dat <-merge(covid_italy, dat.tmp, 16 by=c dat.sort <-dat[order(dat$id.int),] # IMP: sorting the interaction indices is needed ) intercept[italy_graph$cc$nodes <-i 25 intercept <-as.factor(intercept) sort, data.frame(id.province=1:italy_graph$n, intercept. cc=intercept)) is needed when using joint prior (jp) inside control.expert = list(jp = <-m(dat.sort$id.province, igmrf.type = 'besag ### the user must specify 'hyper', with the scaling parameters of the PC priors for tau and gamma: 44 hyper <-list