key: cord-0641106-i515d31b authors: Niekerk, Janet van; Krainski, Elias; Rustand, Denis; Rue, Haavard title: A new avenue for Bayesian inference with INLA date: 2022-04-14 journal: nan DOI: nan sha: 3a96e11f803fc886a5b076b74429f6e97c0dc58b doc_id: 641106 cord_uid: i515d31b Integrated Nested Laplace Approximations (INLA) has been a successful approximate Bayesian inference framework since its proposal by Rue et al. (2009). The increased computational efficiency and accuracy when compared with sampling-based methods for Bayesian inference like MCMC methods, are some contributors to its success. Ongoing research in the INLA methodology and implementation thereof in the R package R-INLA, ensures continued relevance for practitioners and improved performance and applicability of INLA. The era of big data and some recent research developments, presents an opportunity to reformulate some aspects of the classic INLA formulation, to achieve even faster inference, improved numerical stability and scalability. The improvement is especially noticeable for data-rich models. We demonstrate the efficiency gains with various examples of data-rich models, like Cox's proportional hazards model, an item-response theory model, a spatial model including prediction, and a 3-dimensional model for fMRI data. Bayesian inference for latent Gaussian models can be done using 'exact' MCMC-based or approximate methods. Integrated Nested Laplace Approximations (INLA) [38] is an approximate method that deterministically performs inference of latent Gaussian models. Latent Gaussian models include various common and wieldy statistical models including, among others, generalized linear mixed models (GLMM), generalized additive mixed models (GAMM), smoothing spline models, spatial and spatio-temporal models, survival and joint More than 250 studies on COVID-19 used INLA to perform Bayesian inference, see among others [32, 10, 37, 17, 5, 9, 16] . Public health studies on HIV and associated risk factors were performed using INLA by [11, 33, 2, 45] . In environmental statistics, air pollution and its health effects were modeled by [41, 40] while models for water and soil pollution were considered by [20] . In ecology, [36, 35, 21] analyzed data on forest fires and species distribution modeling was performed by [15, 27, 8, 31] . These are merely a selected few of the recent studies that entails Bayesian inference completed with INLA, and from this the impact of INLA on the applied sciences is evident. The classic formulation of the INLA methodology as presented by [38] includes the linear predictors as part of the latent field. This formulation has various benefits such as automatic inference of the linear predictors and conditional independence in the likelihood contributions since each datapoint depends on only one element in the latent field. Furthermore, it gives a unified framework to go beyond Gaussian approximations, like using simplified Laplace approximations which are much more accurate especially for the mean. On the downside, the latent field gets artificially large with parts that are almost sin-gular. As an illustration, consider a simple regression model y i = a + bx i + i , with n observed pairs (x i , y i ), then the classic INLA formulation will represent this as a graph of size n + 2, a and b plus the n linear predictors, whereas the modern formulation will use a graph of size 2 to represent a and b, as the linear predictors are no longer an explicit part of the latent field. In most cases, this does not cause a serious issue, but can be leveraged to show an advantage in comparisons [44, 34] . But as applications are pushing the boundaries more and more, we needed to reformulate the internal model representation and how computations and approximations are done. Of vital importance here, is the new Variational Bayes correction for the mean as developed in [49] , that actually allows us to gain higher accuracy in the mean, with essentially no added cost in the new model representation. The new model formulation adds a clear computational gain in data-rich scenarios, where the data is much larger than the model itself and sometimes of higher dimension. Additionally, the new formulation gives improved numerical stability and a better framework for future developments. We will now present some basic concepts of INLA and the classic formulation. Then in Section 2.1 present the modern formulation. Suppose we have response data y y y n×1 with density function π(y|X X X , θ θ θ) and link function h(.), that is linked to some covariates Z Z Z = {X X X, U U U } through linear such that {f f f } are unknown functions of U U U and β β β is the coefficients for the linear effects of X X X on η η η. The inferential aim is to estimate X X X m×1 = {β 0 , β β β, f f f }, which is the set of unobserved latent variables, or the latent field, and θ θ θ, the p-variate vector of hyperparameters. In the case where a Gaussian prior is assumed for X X X , the hierarchical statistical model in (1) is a latent Gaussian model (LGM). Latent Gaussian models are a broad class of models such as Gaussian processes for time series, spatial and spatio-temporal data or clustered random effects models for data with a grouping structure. The formulation given in (1) can be seen as any model where each one of the f k (.) terms can be written in matrix form as A A A k u u u k . But it is still not exhaustive as volatility models can also be fitted with INLA, as in [28] and [4] . The latent Gaussian model is the hierarchical Bayesian model defined as follows, where θ θ θ contains the hyperparameters from the likelihood (like shape or precision parameters) and the latent prior (like precision or weight parameters), and θ θ θ is allowed to assume any (reasonable) prior. For inference we then need the following marginal posteriors, π(θ j |y y y) and π(X j |y y y) We will now review the classic model formulation of [38] . The linear predictors, η η η, are now included in an augmented latent field that includes the linear predictors as the first n elements of X X X and the m model components as defined in (1). This augmentation results in a singular covariance matrix of X X X since η η η is a deterministic function of the other elements of X X X . To circumvent the singularities, we redefine where ∼ N (0 0 0, τ −1 I I I) with large fixed τ . We thus include a tiny noise term to the linear predictors to move from a singular covariance matrix of X X X to a non-singular covariance matrix. With this formulation, each datapoint depends on only one element of the latent field. Now we can write the joint density of the latent field, hyperparameters and the data as π(X X X , θ θ θ|y y y) ∝ π(θ θ θ)π(X X X |θ θ θ) From here the posterior of the hyperparameters can be approximated as π(θ θ θ|y y y) ∝ π(X X X , θ θ θ|y y y) π G (X X X |θ θ θ, y y y) X X X =µ µ µ(θ θ θ) , where π G (X X X |θ θ θ, y y y) is a Gaussian approximation to the true π(X X X |θ θ θ, y y y) using the Laplace method [47] by matching the mode and the curvature around the mode of π(X X X |θ θ θ, y y y). Next the posterior marginals can be approximated bỹ π(X j |y y y) = π(X j |θ θ θ, y y y)π(θ θ θ|y y y)dθ θ θ π(θ j |y y y) = π(θ θ θ|y y y)dθ θ θ −j . The approximationπ(X j |θ θ θ, y y y) can be obtained in two ways, either from the Gaussian approximation π G (X X X |θ θ θ, y y y) in (5), or from another Laplace approximation step as follows: π(X j |θ θ θ, y y y) ∝ π(X X X , θ θ θ|y y y) π G (X X X −j |X j , θ θ θ, y y y) X X X −j =µ µ µ−j (θ θ θ) , where µ µ µ −j (θ θ θ) is the mode of π G (X X X −j |X j , θ θ θ, y y y). The Gaussian strategy is computationally cost effective but in the case of non-Gaussian likelihoods can be quite inaccurate, whereas the extra Laplace approximation step (the nested one) is computationally more expensive but provides sufficiently accurate inference. The unified treatment of model components and the linear predictors, allow us to compute the simplified Laplace approximation (see [38] for details), say, for all components, in a unified way. In many cases, the augmentation of the latent field by the linear predictors does not increase the computational cost of the inference by much, but for low-dimensional models with high-dimensional data, the nested Laplace approximation gets more computationally expensive, mainly due to this augmentation. As noted, initially the linear predictors were included in the latent field definition to perform inference in a unified way. The other options at the time were to not include them and then calculate their posteriors, post-inference of the latent field. In the case of the posterior Gaussian assumption for the latent field and linear predictors this approach is cost effective but can be quite inaccurate since the tiny errors compound in the post-inference framework. Another option is to construct skew-normal marginals with a Gaussian copula for the linear predictors from the inference of the latent field. This option is computationally expensive but more accurate and essentially negates the computational benefit of removing the linear predictors from the latent field. Recent developments by [49] enables us to define a new framework for approximate Bayesian inference using INLA without the linear predictor in the latent field formulation, defined as an extended latent Gaussian model by [44] . The low-rank Variational Bayes correction to the posterior means of a Gaussian latent field proposed by [49] can now be used to efficiently correct the posterior means of the latent field, post-inference of the latent field, and then obtain the marginal posteriors of the linear predictors from integrating the Gaussian conditional marginals. Without this efficient correction, the results might be inaccurate. We present the proposed methodology in Section 2 and illustrate this new approach with examples in Section 3 and Section 4. In this section we propose the modern formulation of the INLA methodology, and some details on the implementation thereof to attain computational efficiency and sufficiently accurate inference. The main enhancement of the classic formulation is that we do not augment the latent field with the "noisy" linear predictors. The latent field of dimension m is defined as with Gaussian prior X X X |θ θ θ ∼ N (0 0 0, Q Q Q −1 (θ θ θ)), and the n linear predictors are defined with A A A a sparse design matrix that links the linear predictors to the latent field. From this formulation the joint density of the latent field, hyperparameters and the data is derived as With this formulation the first step is to find the mode and the Hessian at the mode ofπ(θ θ θ|y y y) defined as follows: π(θ θ θ|y y y) = π(X X X , θ θ θ|y y y) π G (X X X |θ θ θ, y y y) X X X =µ µ µ(θ θ θ) . The Gaussian approximation π G (X X X |θ θ θ, y y y) to π(X X X |θ θ θ, y y y) is calculated from a second order expansion of the likelihood around the mode of π(X X X |θ θ θ, y y y), µ µ µ(θ θ θ) as follows log (π(X X X |θ θ θ, y y y)) ∝ − the Gaussian approximation at the mode, µ µ µ(θ θ θ), so that Note that with the modern formulation, the graph of the Gaussian approximation consists of two components, 1. G p : the graph obtained from the prior of the latent field through Q Q Q(θ θ θ) 2. G d : the graph obtained from the data based on the non-zero entries of Next, the marginal conditional posteriors of the elements of X X X is calculated from the joint Gaussian approximation in (9) as To get the marginal posterior of X j we need to (numerically) integrate θ θ θ out from (11) using K integration points θ θ θ k and area weights δ k defined by some numerical integration schemẽ π(X j |y y y) = π G (X j |θ θ θ, y y y)dθ θ θ ≈ K k=1 π G (X j |θ θ θ k , y y y)π(θ θ θ k |y y y)δ k . Even though we propose a Gaussian for the conditional marginal posterior of X j in (11), the marginal posterior of X j will be skewed due to the integration as in (12). In the classic formulation, the marginal posteriors of the linear predictors are automatically calculated since they constitute the first n elements of the latent field. With the new formulation as set out in Section 2.1, we need to calculate the marginal posteriors of the linear predictors after we calculatedπ(X j |y y y) and π(θ θ θ|y y y), and we need to do this in an efficient way if we want computational benefit compared to the classic INLA framework. In order to calculateπ(η i |y y y), we first calculateπ(η i |θ θ θ, y y y). We postulate a Gaussian density for η i |θ θ θ, y y y such thatπ(η i |θ θ θ, y y y) = π G (η i |θ θ θ, y y y), with mean E(η η η|θ θ θ, y y y) = A A AE(X X X |θ θ θ, y y y) = A A Aµ µ µ(θ θ θ) and covariance matrix Cov(η η η|θ θ θ, y y y) = A A ACov(X X X |θ θ θ, y y y)A A A , from (11), and since Note that A A A is a sparse matrix so the computations are efficient. However, in the Gaussian approximation to π(X X X |θ θ θ, y y y) we calculate the precision matrix Q Q Q X , but to use (13) we need the covariance matrix Q Q Q −1 X , which is very expensive to calculate and store. Conversely, for the marginal conditional posterior of η i , we only need the i th diagonal element of A A ACov(X X X |θ θ θ, y y y)A A A . Define the sparse selected inverse of a symmetric sparse matrix Q Q Q with graph G X as the symmetric matrix C C C with elements {C il : i ∼ G X l or i = l}. The elements of the sparse selected inverse is computed from the Cholesky decomposition of Q Q Q = L L LL L L as follows (see [46] and [38] for details) where δ ij = 1 for i = j and zero otherwise. This way we only calculate the necessary elements of the inverse and using only the non-zero elements of L L L provided Q Q Q has been properly reordered. Now let C C C be a sparse selected inverse Note that in Q Q Q X we get elements from A A A D D DA A A whereas for the variance of the linear predictors we need the diagonal of A A AC C CA A A . Since Q Q Q X is based on graph (14) we then know that if i ∼ G X l then we must have A ji A jl = 0. This ensures a sensible calculation of the variances defined in (14) . Now that we can calculate the mean and the variance of the conditional posterior of η j we can calculate the marginal posterior of η j using (12) as follows: so that E(η j |y y y) = µ j and Var(η j |y y y) = σ 2 j . The posterior means of η η η and X X X might be inaccurate based on the Gaussian assumption of the conditional posterior, especially in the case of a non-Gaussian likelihood. The recent proposed Variational Bayes correction to Gaussian means by [49] , can be used to efficiently calculate an improved mean for the marginal posteriors of the linear predictors, by improving the posterior means of the latent field. [51] showed that Bayes' rule is a 100% efficient information processing rule. Based on this work, we can solve for the posterior based on the following variational function, E q(X X X |y y y) [− log l(X X X |y y y)] + KLD [q(X X X |y y y)||π(X X X )] where q(.) is a member of the variational class, π(.) is the prior and l(.) is the likelihood function. Instead of using this method to calculate the posterior of X X X |θ θ θ, we rather use the variational framework to calculate a posterior correction to the mean from the Gaussian approximation at the mode as in (10) and (11) . There are other ways to find element-wise corrections to the elements of the Gaussian mean, but with element-wise changes there is no guarantee that the overall effects would be an improvement. With this variational formulation on the joint mean, we are sure of a joint improvement. Also, this formulation enables a low-rank correction where influential elements are corrected and the effect of these corrections are then propagated to the other elements such that the corrections imply an improved joint mean. We present the details of this approach next. From Section 2.1, (10) a mean, µ µ µ(θ θ θ), of X X X |y y y, θ θ θ is calculated in the Gaussian approximation at the mode of π(X X X |y y y, θ θ θ). Using a Variational framework we can improve on this mean as proposed by [49] . To this end, we formulate a low-rank variational Bayes correction of size p < m to µ µ µ(θ θ θ), such that the improved mean where M M M is a matrix that propagates the correction made to p nodes to the rest of the latent field. The propagation matrix M M M is derived from selected columns of Q Q Q −1 X (θ θ θ) (corresponding to the p nodes of explicit correction) as presented by [49] . The explicit corrections λ λ λ, are solved for iteratively using the variational function arg λ λ λ min E X |y y y,θ θ θ∼N (µ µ µ(θ θ θ)+M M Mλ λ λ,Q Q Q −1 X (θ θ θ)) [− log l(X X X |y y y)] + The expected log-likelihood is calculated using Gauss-Hermite quadrature with n J nodes x x x and n J weights w w w, as follows: where Q Q Q X (θ θ θ) = L L L(θ θ θ)L L L (θ θ θ) from a Cholesky decomposition, σ i (θ θ θ) is from (15) and µ i (θ θ θ) = (A A Aµ µ µ * (θ θ θ)) i . Then a second-order Taylor series expansion around λ λ λ = 0 0 0 is calculated and used to approximate E X |y y y,θ θ θ∼N (µ µ µ(θ θ θ)+M M Mλ λ λ,Q Q Q −1 X (θ θ θ)) [− log l(X X X |y y y)]. This correction is computationally efficient and achieves accuracy for the mean similar to a full integrated nested Laplace approximation as shown by [49] . Now this improved mean can be used in (15) for posterior inference of η η η as follows: π G (η j |θ θ θ k , y y y)π(θ θ θ k |y y y)δ k . To find the mode ofπ(θ θ θ|y y y) we need to perform unconstrained optimization. Based on the work of [1] we use the smart gradient approach to more accurately and efficiently calculate the gradient, and hence find the mode and the Hessian at the mode more accurately. Numerical methods for estimating the gradient such as the forward differ- Then we subtract those directions already included, through projections, as to get a unique contribution for each direction. From these projected directions we can construct an orthonormal basis using MGS, contained in the columns of G G G (k) , that spans the same space as the original direction vectors, and the estimated gradient at iteration k can then be calculated as follows: This smart gradient framework enables a more accurate and expedited location of the mode ofπ(θ θ θ|y y y). For high performance computing environments, INLA has been parallelized using the OpenMP and PARDISO libraries. PARDISO is a state-of-the-art library containing parallelized versions of efficient sparse linear algebra solvers that is employed at all stages of the INLA algorithm. Another level of parallelism is introduced for calculatingπ(θ θ θ|y y y) at multiple θ θ θ, and finding the mode ofπ(θ θ θ|y y y) through a parallel line search algorithm. This multi-level parallelism has been shown to lead to speed-up factors of up to 10 for large models. This development allows the user to efficiently employ the INLA methodology in multi-core architectures. For more details see [13] . In this section we illustrate the novelty of the new framework with some wellknown data-rich models, the Cox proportional hazards model and item-response theory models. We will also illustrate that the improved numerical stability can make a difference, like in models for spatial prediction. We deliberately chose the simulated examples to give moderate speedup, as due to differences in the computational scaling, we can essentially achieve any speedup we desire. All results were computed on an Ubuntu system with Intel Xeon Gold 6230R at 2.1GHz processors and 772GB of RAM. The likelihood of Cox's proportional hazards model is equivalent to the likelihood of a specific Poisson process as shown by [14] and [19] . This reformulation is necessary for the Cox's proportional hazards model to be classed as a latent Gaussian model. This process entails data augmentation since for each datapoint in the Cox model, we need k ≤ K datapoints for the Poisson regression model, with K the number of bins in the baseline hazard where the baseline hazard is assumed piecewise log-constant. Suppose there is an observed time and censoring indicator (t, d) and we assume a baseline hazard with B bins, (s 1 , s 2 , ..., s B ), then for each bin until the observed time t we will have k augmented datapoints such that there are k−1 datapoints with a Poisson distribution with mean λ j = exp(η j )(s j+1 − s j ), j = 1, ..., k − 1 that is observed to be 0, and one datapoint observed as 0 (t is censored) or 1 (t is not censored) that follows a Poisson distribution with mean with exp(b j ) as the baseline hazard for bin j, and we assume a random walk model of order one or two for b b b (see [29] for more details). This data augmentation scheme clearly results in a much larger dataset than the original dataset, and in this case including the linear predictors in the latent field could result in much higher computational cost. We illustrate the advantage of the new approach with Cox proportional hazards models in the next section. We simulate survival data for n patients using the following very simple Cox proportional hazards model where x is a scaled and centered continuous covariate, and the baseline hazard, h 0 (t) is estimated using a scaled random walk order one model (see [42] ) with 50 bins. We also consider four different values of n which are n = 10 2 , to 10 5 , illustrate the scalability of the new approach. In Table 1 This dataset contains the information of 2843 patients residing in Australia who were diagnosed with AIDS. The clinical aim is to investigate how age (Age) and transmission category (TC) affects the hazard rate of death due to AIDSrelated complications, for more details on the dataset see [50] . Another interest is the effect of Zidovudine (AZT) therapy which was introduced in Australia in July 1987. To this end we added the information on AZT use for those cases after July 1987 to see if there is an effect on the hazard of death. The model we consider is where f (Age) is an unknown non-linear function of age and the log of the baseline hazard log h 0 (t) are both estimated by a scaled random walk order two prior model (see [22] ) with precision parameters τ f and τ h , respectively. The augmented dataset contains 24 830 records and the analysis is performed in 0.8 seconds with the new framework. The results are presented in Table 2 and Figure 1 . We note that the use of AZT and the category TC=3 are significantly associated with a decreased hazard of death. We also see that age has a significant effect on the hazard, such that the hazard increases with age. Posterior mean 0.025-Quantile 0.975-Quantile Hazard Ratio The Item Response Theory (IRT) models are often used in the analysis of standardized tests where multiple students answer multiple questions. In this setting IRT models are considered to estimate the subjects ability on some school subject. However, IRT models are also being applied in other fields, such as in Figure 1 : Inference for the non-linear effect of age and the baseline hazard social sciences to estimate the ideal point orientation regarding some matter, medicine for diagnosis from patient outcomes and marketing, as one can see from [48] . There are a number of R packages to implement IRT models, considering different parameterization, extensions and inference paradigm, see [7] . For example, the Bayesian IRT models can be fitted using Stan, as shown in [6] . But specialized efficient algorithms and implementation exists, see [26] . As in many cases the IRT models can be framed as conventional statistical models and, consequently, conventional statistical modeling software such as INLA as illustrated in [25] . In the simplest case the data available is coded as zero or one as the answer from subject j to item i, where κ j is the subject score, α i is the item difficulty and β i is the item discrimination. Therefore, we have n + 2m parameters, where n is the number of subjects and m is the number of items, with nm available data. We simulate data for n students responding to 20 items. For each simulation, the model parameters were drawn randomly as β i ∼ Gamma (20, 20) , α i ∼ N (0, 1), κ j ∼ N (0, 1). We consider various values for n and present the results in Table 3 . We consider data from the Brazilian national high school exam, the Exame Nacional do Ensino Medio -ENEM. We selected a Mathematics test which has 45 items, taken by 24956 students of Curitiba city in 2020. In Figure 2 we have a summary plots of the posterior mean and 95%CI for all the model parameters. The results were computed in 378 seconds. We can see that higher α i was generally the case for items with lower proportion of correct answers. For the discrimination parameter we can see that β 20 is the highest. The students capabilities are overall proportional to the proportion of items answered correctly. Overall, students with more correct answers have higher κ j . The variation in the fitted score for students with same number of correct answers is due to the difference in the item parameters. All these students correctly answered at least one item. The students with all 45 items correctly answered have the same fit for κ j , and the highest mean. In the analysis of data considering the spatial location, the models are defined so that the size of the random effect accounting for the spatial correlation, is equal or in the same order as the size of the data. In disease mapping, a common approach to analyze counts from areal data is to include two random effects, one structured and one unstructured, each one with size equal to the number of observations. In the analysis of point referenced data using the SPDE approach, the size of the model is usually not the same as the data size as it is defined on a mesh (see [3] ) with the number of nodes similar to the number of data locations. To illustrate the benefit of the modern INLA formulation for these cases, we consider an SPDE model with 948 mesh nodes (see supplementary material for details of the data generation procedure). We consider the same mesh and vary the number of data locations, n. In Table 4 we report the number of observations and the computing time considering the classic and modern INLA approaches. It is clear that even for non data-rich models, the computing time based on the modern INLA formulation is still low. A second point is that one of the goals is to make predictions at a fine resolution, usually on a fine grid. There are two ways to compute the posterior is large, the latter was usually suggested ( see [18] ), as the first may cause numerical instability. This numerical instability stems from the fact that the classic approach enlarges the model problem by including the linear predictor in the latent field and this part is near singular, see Eq. (4) in [39] . The modern formulation is thus twofold beneficial, as it avoids the numerical instability and it reduces the computing time. We can fit the model considering only the available data, and then re-fit including the prediction scenario but considering the previously found mode for the hyperparameters. We consider this approach for the following evaluation. We evaluated the computing time considering three different prediction grid sizes for the model on a mesh with 946 nodes, conditional on 10 4 observations. In Table 5 Cs-fMRI offers dimension reduction and greater neurobiological significance of distances, amongst many other advantages. Nearby regions on cs-fMRI tend to exhibit similar behaviour while large differences can occur between nearby locations in volumetric fMRI. We can thus use a surface-based spatial statistical model to analyze cs-fMRI data, proposed by [30] , defined for T timepoints and N vertices per hemisphere resulting in data y y y T N ×1 with the latent Gaussian model as follows: β β β k = Ψ Ψ Ψ k w w w k (SPDE prior on β β β k , the coefficients of the activation amplitudes X X X k ) where we have K task signals and J nuisance signals. We analyze test-retest motor task fMRI data from a subject in the Human Connectome Project (HCP) (https://www.humanconnectome.org/). Multisubject analysis can be done as in [30] and [43] , but for illustration of the modern INLA formulation, we focus on the single subject case. The data consists of a 3.5-min fMRI for each subject, consisting of 284 volumes, where each subject performs 5 different motor tasks interceded with a 3 second visual cue. Each hemisphere of the brain contained 32492 surface vertices. From these, 5000 are resampled to use for the analysis. This results in a response data vector y y y of size 2 523 624, with an SPDE model (see [23] , [3] and [18] for details) defined on a mesh with 8795 triangles. Clearly, the data is much larger than the size of the model, constituting a data-rich model. This imbalance in data and model size is inherent to fMRI data and thus the modern formulation of INLA is essential for performing full Bayesian inference. In this illustrative example we consider only the left hemisphere and 4 motor tasks so that we have 4 different activation profiles, each with an SPDE prior with a common mesh. The inference based on the modern formulation of INLA was computed in 148 seconds. We present here a modern formulation of the INLA methodology, based on the reformulation of the latent field by removing the linear predictors from the Bayesian inference and stable prediction, also for data-rich models, in a computationally efficient and accurate manner. The code for the examples and application is available at https://github. com/JanetVN1201/Code_for_papers. Smart Gradient -An adaptive technique for improving gradient estimation Spatial codistribution of HIV, tuberculosis and malaria in Ethiopia Spatial Modeling with R-INLA: A Review Integrated nested Laplace approximations for threshold stochastic volatility models. Econometrics and Statistics Spatial inequities in COVID-19 testing, positivity, confirmed cases, and mortality in 3 US cities: an ecological study Bayesian item response modeling in R with brms and Stan R packages for item response theory analysis: Description and features Predicting marine species distributions: complementarity of food-web and bayesian hierarchical modelling approaches Community factors and excess mortality in first wave of the COVID-19 pandemic in England Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil Mapping HIV prevalence in sub-Saharan Africa between Freesurfer Parallelized integrated nested laplace approximations for fast bayesian inference. arXiv The analysis of rates and of survivorship using log-linear models Data integration for large-scale models of species distributions Long-term exposure to air-pollution and COVID-19 mortality in England: A hierarchical spatial analysis Magnitude, demographics and dynamics of the effect of the first wave of the COVID-19 pandemic on all-cause mortality in 21 industrialized countries Advanced spatial modeling with stochastic partial differential equations using R and INLA Covariance analysis of censored survival data using log-linear analysis techniques Water and soil pollution: Ecological environmental study methodologies useful for public health projects. a literature review Empirical analyses of the factors influencing fire severity in southeastern australia On the second-order model for irregular locations An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach The statistical analysis of fMRI data Bayesian explanatory additive IRT models MCMCpack: Markov Chain Monte Carlo in R Species distribution modeling: a statistical review with focus in spatio-temporal issues. Stochastic environmental research and risk assessment Estimating stochastic volatility models using integrated nested Laplace approximations Approximate Bayesian inference for survival models A bayesian general linear modeling approach to cortical surface fmri data analysis Disentangling drivers of spatial autocorrelation in species distribution models Assessing differential impacts of COVID-19 on black communities Mapping geographic clusters of new HIV diagnoses to inform granular-level interventions for HIV epidemic control in western Kenya A statistical introduction to template model builder: A flexible tool for spatial modeling Prediction of regional wildfire activity in the probabilistic bayesian framework of firelihood Effects of human-related and biotic landscape features on the occurrence and size of modern forest fires in sweden Risk for COVID-19 infection and death among Latinos in the United States: examining heterogeneity in transmission dynamics Approximate Bayesian inference for latent Gaussian models by using integrated nested laplace approximations Bayesian computing with INLA: A review Long-term effect of outdoor air pollution on mortality and morbidity: a 12-year follow-up study for metropolitan france Data integration for the assessment of population exposure to ambient air pollution for global burden of disease assessment Scaling intrinsic gaussian markov random field priors in spatial modelling Spatial bayesian GLM on the cortical surface produces reliable task activations in individuals and groups. NeuroImage Fast, scalable approximations to posterior distributions in extended latent gaussian models. (), xx(xx Spatiotemporal modelling and mapping of cervical cancer incidence among HIV positive women in South Africa: A nationwide study Formation of sparse bus impedance matrix and its application to short circuit study Fully exponential Laplace approximations to expectations and variances of nonpositive functions Handbook of Item Response Theory Correcting the Laplace Method with Variational Bayes Modern applied statistics with S-PLUS Optimal information processing and Bayes's theorem