key: cord-0202831-ct9xg4fp authors: Murakami, Daisuke; Kajita, Mami; Kajita, Seiji; Matsui, Tomoko title: Compositionally-warped additive mixed modeling for a wide variety of non-Gaussian spatial data date: 2021-01-11 journal: nan DOI: nan sha: c007f565691c6fe441cea4f6a4483e9fd9def5dd doc_id: 202831 cord_uid: ct9xg4fp As with the advancement of geographical information systems, non-Gaussian spatial data sets are getting larger and more diverse. This study develops a general framework for fast and flexible non-Gaussian regression, especially for spatial/spatiotemporal modeling. The developed model, termed the compositionally-warped additive mixed model (CAMM), combines an additive mixed model (AMM) and the compositionally-warped Gaussian process to model a wide variety of non-Gaussian continuous data including spatial and other effects. A specific advantage of the proposed CAMM is that it requires no explicit assumption of data distribution unlike existing AMMs. Monte Carlo experiments show the estimation accuracy and computational efficiency of CAMM for modeling non-Gaussian data including fat-tailed and/or skewed distributions. Finally, the model is applied to crime data to examine the empirical performance of the regression analysis and prediction. The result shows that CAMM provides intuitively reasonable coefficient estimates and outperforms AMM in terms of prediction accuracy. CAMM is verified to be a fast and flexible model that potentially covers a wide variety of non-Gaussian data modeling. The proposed approach is implemented in an R package spmoran. A wide variety of spatial and spatiotemporal data is now becoming available. In addition to conventional spatial data collected and published in a top-down manner (e.g., census statistics), an increasing number of sensing data (e.g., remotely sensed images, smart sensor data) and other spatial data assembled, estimated, and disseminated by private companies and volunteers are available in the era of open data (Volunteered Geographic information; see Haklay, 2013) . Such databases include Google Earth Engine (https://earthengine.google.com/), WorldPop (https://www.worldpop.org/), and Worldometer (https://www.worldometers.info/). This rapid growth of spatiotemporal open data drives people to use regression modeling to reveal hidden factors behind social issues such as crime occurrence (e.g., Kajita and Kajita, 2020 ) the spread of COVID-19 (e.g., Sannigrahi et al., 2020) . Despite the fact that spatial data are becoming increasingly diverse, Gaussian spatial regression models, which have been commonly used (e.g., Wikle and Cressie, 2011) , are applicable only to data obeying a Gaussian distribution. Because regression problems are commonplace in spatial fields, an alternative approach for a wide variety of non-Gaussian spatial data is required. According to Yan et al. (2020) , representative modeling approaches for non-Gaussian spatial data are classified with (a) variable transformation and (b) generalized linear modeling. The former converts explained variables, which have a non-Gaussian distribution, to Gaussian variables through a transformation function. Logarithmic transformation (Dowd, 1982) , Box-Cox transformation (Kitanidis and Shen, 1996) , Tukey g-and-h transformation (Xu and Genton, 2017) , and other transformation functions have been used in geostatistics (see Cressie and Wikle, 2011) . In machine learning literature, the warped Gaussian process (WGP; Snelson et al., 2003) is a general framework including aforementioned transformation approaches. It has been extended to Bayesian WGP (Lázaro-Gredilla, 2012) and the compositionally-warped GP (CWGP; Rios and Tober, 2019), which we will focus on later. The generalized linear model (b) is widely used for spatial modeling as well. This model accommodates spatial random effects and has been developed and extended under Bayesian framework (Gotway and Stroup, 1997; Diggle et al., 1998) . While Bayesian approaches can be slow because of the simulation step, fast Bayesian inference including the integrated nested Laplace approximation (Rue et al., 2009 ) and Vecchia-Laplace approximation (Zilber and Katzfuss, 2019) has been developed for fast non-Gaussian spatial regression modeling. The generalized additive (mixed) model (e.g., Wood, 2017) is another popular approach that is applicable for non-Gaussian spatial data modeling (see, Umlauf, 2012) . Computationally efficient estimation algorithms for the generalized additive model, including the fast restricted maximum likelihood (REML; Wood, 2011) and the separation of anisotropic penalties algorithm (Rodríguez-Álvarez et al., 2015) have been developed and implemented in a wide variety of software packages (see Mai and Zhang, 2018 for review). 1 A critical limitation of the above-mentioned approaches include the need to assume data distribution a priori. 2 Because the true distribution behind data is usually unknown in empirical studies, distribution assumption can lead model misspecification. Exceptionally, CWGP, which is categorized in (a), estimates data distribution without explicitly assuming any distribution a priori (see Section 2.2). In short, CWGP is a distribution-free approach. Such an approach is potentially useful for modeling a wide variety of spatial and spatiotemporal data, although the original CWGP cannot consider spatial and/or temporal effects. This study proposes an approach termed compositionally-warped additive mixed modeling (CAMM) as a unified regression approach for a wide variety of Gaussian and non-Gaussian continuous data. This approach is developed by combining CWGP and additive mixed models (AMM) that take roles to estimate data distribution without explicit prior information and quantify the spatial and other smooth and/or group effects depending on covariates, respectively. The remainder of the sections is organized as follows. Section 2 and 3 introduce AMM and CWGP respectively. Then, Section 4 develops CAMM by combining them. Section 5 performs Monte Carlo experiments to examine coefficients-estimation accuracy and computational efficiency of the developed approach. Section 6 employs CAMM to crime analysis in Tokyo, Japan. Finally, Section 7 concludes our discussion. AMM is a regression model accommodating spatial, temporal, and many other effects. The linear AMM is defined as follows: where ! = [4 $ , … , 4 & ]′ is a N × 1 vector of explained variables, X is a N × J matrix of J covariables assuming fixed effects, β is a J × 1 vector of fixed coefficients. " ′ " represents the matrix transpose. 0 % is the variance parameter, 0 is a zero vector, and I is an identity matrix. K is the number of random effects. ' ! is a vector of random effects depending on the k-th variate 7 ! , which is either a matrix or a vector of k-th covariate; for example, ' ! describes a temporal process if 7 ! is defined by a N × 1 There are non-Gaussian spatial modeling approaches other than (a) and (b), including copula-based spatial modeling (e.g., Gräler, 2014) , the Gaussian mixture approach (e.g., Fonseca and Steel, 2011) , and approaches assuming non-Gaussian spatial processes (e.g., Zhang and El-Shaarawi, 2010 1 vector of temporal coordinates whereas spatial process if it is defined by a N × 2 matrix of spatial coordinates. ' ! is specified as follows: where 8 ! is a N × L matrix of L basis functions generated from 7 ! . Orthogonal, radial, and other functions are available to generate basis functions from 7 ! . 9 ! is a vector of random coefficients depending on the variance-covariance matrix ; ! (< ! ) and a set of variance parameters < ! . ' ! can model group effects, non-linear effects, time-varying effects, and spatially varying effects by specifying 8 ! (see Umlauf et al., 2012) . For example, if 8 ! is defined by spatial basis functions generated from spatial coordinates, ' ! yields a spatial process. Moran eigenvectors (Griffith, 2003) , which are extracted from a doubly-centered spatial proximity matrix, are spatial basis functions based on the Moran coefficient (MC; Moran, 1950) . MC takes a positive value in the presence of positive spatial dependence while the value is negative in the presence of negative spatial dependence. The Moran eigenvectors corresponding to positive eigenvalue explain positively dependent map patterns (i.e., MC > 0) whereas those corresponding to negative eigenvalue explain negatively dependent patterns (i.e., MC < 0). Because positive spatial dependence is dominant in most regional data (Griffith, 2003) , the eigenvectors corresponding to positive eigenvalue are usually used to estimate underlying positively dependent spatial processes. Murakami and Griffith (2015) used the following Moran eigenvectorbased specification for modeling positively dependent spatial process: where < ! ∈ {= ! % , A ! }. 8 ! (() is a N × L matrix of the L Moran eigenvectors corresponding to positive eigenvalue and > is a L × L diagonal matrix of the positive eigenvalues. = ! % estimates the variance of the spatial process while A ! estimates the Moran coefficient value or scale of the process (see Murakami and Griffith, 2020a) . In spatial statistics, it is a major concern to analyze spatially varying relationships between explained and explanatory variables (e.g., Brunsdon et al., 1998) . Spatially varying coefficients (SVC) or other varying coefficients can be estimated using AMM by slightly modifying Eq. (1) as follows: where C ! is the k-th column of X, ° is an element-wise (Hadamard) product operator, and ' ! is the k-th varying coefficients vector. Later, ' ! is defined by ' ! (() to model SVCs. The AMMs (Eqs. 1 and 3) have the following expression: ! = #$ + 8 E 9 F + (, … . 9 F~-(/ G , ; G (H)), where 8 E = [8 $ , … , 8 " ] for Eq.(1) and 8 E = [C !°8$ , … , C !°8" ] for Eq.(3). H ∈ {< $ , … , < " }, 9 F = [9 $ , … , 9 " ]′, / G = [/ $ , … , / " ]′, and ; G (H) is a block diagonal matrix whose k-th block is ; ! (< ! ). The likelihood function is given as follows (see Bates and DebRoy, 2004) : where I E = ∑ I ! " !#$ , and the restricted likelihood function as We prefer the restricted maximum likelihood (REML) estimation maximizing Eq. (6) because of its high estimation accuracy and stability shown by Reiss and Ogden (2009) and Wood (2011). 3. Compositionally-warped Gaussian process (CWGP) Snelson et al. (2003) developed the warped GP (WGP) transforming non-Gaussian explained variables ! to Gaussian variables through a transformation (or warping) function Z(). For independent samples, the WGP is defined as where [ 2 (!) = [Z 2 (4 $ ), … , Z 2 (4 & )]′, ] denotes the parameters characterizing the transformation, and μ is a mean vector. Logarithmic, Box-Cox, and other transformations are available for the Z 2 (•) function. Note that the log-normal kriging and trans-Gaussian kriging (see Cressie, 1993) , assuming these transformations are particular types of WGP with a spatially dependent covariance matrix. A drawback of WGP is that the Z 2 (•) function must be given beforehand even though the true data distribution (or transformation function) is unknown. To break the bottleneck, Rios and Tober (2019) proposed the CWGP specifying the transformation function by concatenating D subtransformation functions. The CWGP for independent samples is defined as where Interestingly, they showed that CWGP approximates a wide variety of non-Gaussian distributions without explicitly assuming any non-Gaussian distribution a priori if each transformation is defined by the following function: where ] 5 ∈ {b 5,$ , b 5,% , b 5,7 , b 5,8 } . Based on Rios and Tober (2019), we call Eq.(9) as SAL transformation (Sinh-Arcsinh and Affine where the "L" comes from linear). The likelihood for the CWGP model Eq.(8) yields where Given Eq. (9), the d-th gradient in Eq. (11) yields This section combines CWGP and AMM, and develops the CAMM for non-Gaussian data modeling without explicitly assuming any distribution. The CAMM is defined as with CAMM describes a wide variety of non-Gaussian explained variables through the transformation Eq. (14). While we consider Eq. (13) assuming the varying coefficients x ! y + ' ! on C ! , CAMM can consider many other effects by specifying the random effects term ' ! . As with other (C)WGPs, the likelihood function for our model is readily obtained by adding the Jacobian term ∏ 9: ' (; ( ) 9; ( & 4#$ with the likelihood I +,, ($, H, 0 % ) of the linear AMM as follows: Remember does not include 9 F, the likelihood is further expanded as (see Eq. 5) where On the other hand, the restricted likelihood function for the CAMM is given as where The restricted log-likelihood, which is maximized in the REML, becomes The computational efficiency heavily depends on the computational complexity of log I 1 +,, (H, ], 0 % ) whereas the complexity of ∑ log To make CAMM feasible for very large samples, we extend their algorithm to maximize the restricted log-likelihood). Specifically, we maximize it to estimate the parameters {H, ], 0 % } as follows: (1) Initialize the parameter values zH { , ] | }. (2) Evaluate the inner product matrices It is done by iterating the following calculations: End if the log-restricted likelihood log I 1 _H { , ] | , 0 Ç % ,G a value converges. Otherwise, go back to step (3). Owing to the pre-conditioning step (2) replacing large matrices with small inner product matrices, the heaviest calculation required in the numerical maximization steps (3) Such constraints can be considered in the first transformation Z 2 $ (4 4 ). For example, the Box-Cox transformation, which is defined below, is useful to transform non-negative variables to unconstraint variables, which the SAL transformation implicitly assumes: where Ñ is a parameter determining the non-linearity of the transformation. The Box-Cox transformation yields a linear transformation if Ñ = 1 while log-transformation if Ñ = 0. The logtransformation is available for the same purpose. The log-ratio of other transformations, which has been used for compositional data analysis (e.g., Egozcue et al., 2003) , are available to convert variables with a box constraint to unconstrained variables. The first transformation must be specified carefully considering such value constraints in explained variables. It is well known that the logarithmic and Box- Although not assumed in Rios and Tober (2019), we introduce standardization before and after the SAL transformations to stabilize the estimation. The former is required to unify the initial condition and stabilize the REML estimation. The latter is required to stabilize [ 2 (!); without the latter standardization, the intercept and regression coefficients tend to be extremely large, and the variance of [ 2 (!) as well. Furthermore, in each SAL transformation, the non-negativity of b 5,% and b 5,7 is also assumed to avoid the reversal of the large and small relationship in y. A remaining issue is the selection of D. Fortunately, Bayesian Information Criterion (BIC) or other criterion is available for the selection. Finally, our assumed transformations are summarized in Figure 2 . Remember that the developed approach assumes continuous explained variables. When modeling discrete variables like counts, they must be converted to density, for example, by dividing by area or population. The estimated coefficients β and random effects ' $ , … , ' " quantify the impact of é 4,! on Z 2 (4 4 ) . While these estimates are directly interpretable just like in existing studies using logtransformed explained variables, the marginal effect n4 4 né 4,! ⁄ , which quantifies the influence on the raw explained variables 4 4 is easier to interpret. The marginal effect is expanded as follows: This study generates non-Gaussian explained variables from the following model: where the explanatory variates {é $,4 , é %,4 } are randomly generated from uniform distributions with the intervals [0, 1]. The SVCs {x F,4 , x $,4 , x %,4 } are generated from the spatial moving average processes where ë G,H is the (i, j)-th element of a matrix defined by row-standardizing a spatial proximity matrix whose (i, j)-th element equals exp (−X 4,I /0.5). Spatial coordinates used to evaluate the Euclidean distance X G,H are generated using two independent standard normal distributions. 3 x $,4 is assumed to have a larger variation than the other two by assuming a larger variance than {x F,4 , x %,4 }. Z J,K (•) represents the Tukey g-and-h transformation (Tukey, 1977) that generates skewed and/or fat-tailed distributions, which is for 4 F,4 is defined as The parameters g and h control the skewness and kurtosis (or tail fatness) of the distribution of Z J,K _4 F,4 a. A larger positive g means stronger right skew while a larger negative g means the opposite. A larger h means a fatter tail distribution. 3 In regional analysis, samples are typically concentrated in urban areas. This assumption implies urban areas with dense samples in the center. we also performed simulations assuming î = −0.5 assuming left skew distributions. However, the result is quite similar to the results when î = 0.5; this is probably because distributions given in these two cases are symmetric. We do not report results when î = −0.5. Throughout this section, we use R version 4.0.2 (https://cran.r-project.org/) installed in a Mac Pro (3.5 GHz, 6-Core Intel Xeon E5 processor with 64 GB memory). The coefficient estimation accuracy is evaluated using the root mean squared error (RMSE), which is defined as follows: . The SVC estimates are unbiased if the mean is close to the true values that equal -2.0 for the strong SVC and 0.5 for the weak SVC. Figure 6 shows that the LM and AMM tend to underestimate the coefficients in absolute value. The CAMM estimates have much smaller biases for both the strong and weak SVCs. CAMM is found to be useful to estimate SVCs while avoiding underestimation of the effects from explanatory variables. We In short, Section 5 confirms estimation accuracy and computational efficiency of our proposed approach. In the next section, this approach is applied to a crime analysis in Japan. where ẍ! ,4 is the k-th coefficient for the i-th sample is specified as x ! is a constant, ê ! (ô 4 ) is a spatially varying coefficient (SVC) where ô 4 is the i-th sample site that is given by the geographic center coordinates of the districts, and ê ! (é !,4 ) is a non-spatially varying coefficient (NVC) smoothly varying depending on explanatory variable value. The SVC is given using Eq. (2) to model the positively dependent spatial process while the NVC is given by the natural spline functions generated from é !,4 . ê ! (ô 4 ) and ê ! (é !,4 ) are accepted only if they improve the Bayesian information criterion (BIC); otherwise, they are replaced with zero values. They explain the tendency that crimes tend to repeat in similar regions due to regional heterogeneity (e.g., resident characteristics) and/or repetitive crimes by the same groups (see Farrell, 1995) . In addition, we consider the time-varying intercept ê F (ö 4 ) by quarter ö 4 to eliminate residual temporal dependence while the spatially varying intercept ê F (ô 4 ) defined by Eq. (2) to eliminate residual spatial dependence. They are important because an ignorance of residual spatial and/or temporal dependence can lead to incorrect inferences (see LeSage and Pace, 2009) . We apply AMM with log-transformed explained variables (AMM(log)), which is a popular specification in regional science, and CAMM with/without the Box-Cox transformation (i.e., A and B in Figure 2 ). The number of the SAL transformations is changed between 1 and 4 and compared the Bayesian information criterion (BIC) value to select the best CAMM specification. See Appendix. 3 for details about an implementation of CAMM using the R package "spmoran" (version 0.2.1 onward). Later, we will compare coefficients estimated from the CAMM with D = 2 and the Box-Cox transformation, which we will call CAMM(BC+2SAL), with AMM(log) whose BIC equals -68,447. The log-transformation might be insufficient in our case. In contrast, the histogram obtained through the estimation of CAMM(BC+2SAL) is much closer to the Gaussian distribution without explicit assumption of the distribution. The proposed transformation is conformed to improve the accuracy of the Gaussian approximation. This section compares the parameters estimated from AMM(log) with CAMM(BC+2SAL). By contrast, Repeat is positively significant at the 1 % level in these models in most areas. Both models estimated that the coefficients on Repeat are SNVCs, which vary spatially and nonspatially depending on the Repeat value. Figure 11 plots the estimated coefficients on Repeat. The two models suggest that shoplifting is repeated in the central area of Tokyo, which is the major commercial area. CAMM(BC+2SAL) estimates a greater gap between the central and the other areas compared to the AMM(log) estimates. CAMM might be useful to accurately capture spatial heterogeneity in regression coefficients while coping with non-Gaussianity. Finally, Table 3 summarizes the estimated median marginal effects. For example, the marginal effect from Repeat of 5.714×10 -2 means that shoplifting per retail (i.e., explained variables value) is expected to increase 5.714×10 -2 if Repeat increases by 1. The marginal effects estimates, which quantify the influence on the raw explained variables 4 4 , might be more interpretable than the naïve coefficient estimates measuring the impact on Z 2 (4 4 ). Based on the better BIC value, CAMM must be used to accurately measure the impact from each explanatory variable. (2020) demonstrated the higher prediction accuracy of AMM compared to the kernel density estimation, which is a popular crime mapping method. The objective here is to examine if our proposed transformation further improves the prediction accuracy. Figure 13 mapped the observed and predicted number of shoplifting occurrences per retail. This figure shows that both seemingly reproduce accurate distribution of the crimes. Figure 14 is a plot with the x-and y-axis representing the observed and predicted values, respectively. AMM(log) tends to underestimate in the large value region. On the other hand, CAMM(BC+2SAL) accurately predicts both small and large observations. The difference might be caused by the fact that CAMM accurately models the upper tail of the data distribution using the SAL transformation while AMM failed to achieve it (see Figure 10 ). In fact, the root mean squared prediction error (RMSPE) of CAMM is 1.461 that is smaller than AMM (RMSPE: 1.604). This study proposed the compositionally-warped additive mixed modeling (CAMM), which is a general framework for fast non-Gaussian regression modeling. Unlike other non-Gaussian additive models, CAMM does not require any explicit assumptions on data distribution. CAMM will be useful to estimate spatial, temporal, and other effects while avoiding misspecification relating to data distribution. The Monte Carlo experiments verify the accuracy and computational efficiency of our approach, and the empirical part confirms its usefulness in practice. The developed approach is potentially applicable to a wide variety of spatial and The computational complexity of evaluating log I 1 +,, (H, ], 0 % ) is independent of N because Eqs. (A1) -(A3) do not include any matrix or vector whose size growth with respect to N. The computation cost for Eqs. (A1) is ®(I E 7 ), which is needed for the inversion in Eq. (A3); the computation cost is quite small. For further detail about the REML for AMM, see Murakami and Griffith (2019) . After all, the REML maximizing log I 1 (H, ], 0 % ) will be fast even for large samples. This appendix explains how to implement the CAMM using an R package "spmoran" (version 0.2.1 onward). Here is the code used in the analysis: meig <-meigen( coords=coords, s_id=s_id ) mod <-resf_vc( meig=meig, y=y, x=x, xgroup = xgroup, x_nvc=TRUE, tr_num=2, tr_nonneg = TRUE ) me <-coef_marginal( mod ) where meigen is the function extracting Moran eigenvectors, which are used to specify the spatially varying intercept and SVCs. Spatial coordinates (coords) are used for the extraction (see Drey et al., 2006; Murakami and Griffith, 2015) . If location ID (s_id) is specified, the Moran eigenvectors are extracted by the ID, which is given by minor municipal district in our case. The meigen function, implementing an eigen-decomposition that is slow for large samples, can be replaced with the meigen_f function, which approximates the Moran eigenvectors. The resf_vc function by default estimates a linear SVC model, which is popular in applied spatial statistics (e.g., Brunsdon et al., 1998) . This function is available to estimate the CAMM with varying coefficients. The inputs are as follows: -meig : The output from the meigen or meigen_f function -y : Vector of explained variables -x : Matrix of explanatory variables -xgroup: One or more group IDs for defining random intercept (ID by quartile in our case) -x_nvc : If FALSE, coefficients are defined by SVCs. If TRUE, they are defined by SNVC. The latter specification is much more stable than naïve SVC models (see Murakami and Griffith., 2020b ). -tr_num: The number of SAL transformations -tr_nonneg: If TRUE, the Box-Cox transformation is applied first, and the SAL transformations after that (i.e., B in Figure 2 Quantifying knowledge spillovers using spatial econometric models Linear mixed models and penalized least squares Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression Demand for environmental quality: a spatial hedonic analysis Geographically weighted regression Extreme coefficients in geographically weighted regression and their effects on mapping Statistics for Spatio-Temporal Data Deep gaussian processes Model-based geostatistics Lognormal kriging-the general case Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM) Isometric logratio transformations for compositional data analysis Non-Gaussian spatiotemporal modelling through scale mixing Geographically Weighted Regression: The Analysis of Spatially Varying Relationships Using geographically weighted regression for environmental justice analysis: Cumulative cancer risks from air toxics in Florida A generalized linear model approach to spatial data analysis and prediction Modelling skewed spatial random fields through the spatial vine copula Spatial Autocorrelation and Spatial Filtering: Gaining Understanding through Theory and Scientific Visualization Citizen science and volunteered geographic information: Overview and typology of participation Crime prediction by data-driven Green's function method Geostatistical interpolation of chemical concentration Bayesian warped Gaussian processes Introduction to Spatial Econometrics The GWmodel R package: further topics for exploring spatial heterogeneity using geographically weighted models. Geo-spatial Information Science Geographically weighted regression with a non-Euclidean distance metric: a case study using hedonic house price data Software packages for Bayesian multilevel modeling Notes on continuous stochastic phenomena Random effects specifications in eigenvector spatial filtering: a simulation study A Moran coefficient-based mixed effects approach to investigate spatially varying relationships Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressions A memory-free spatial additive mixed modeling for big spatial data Balancing spatial and non-spatial variation in varying coefficient modeling: a remedy for spurious correlation Smoothing parameter selection for a class of semiparametric linear models Compositionally-warped Gaussian processes Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach Warped gaussian processes Hedonic analysis of housing markets Structured Additive Regression Models: An R Interface to BayesX Multicollinearity and correlation among local regression coefficients in geographically weighted regression Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models Generalized Additive Models: An Introduction with R Tukey g-and-h random fields Multivariate transformed Gaussian processes On spatial skew-Gaussian processes and applications Deep compositional spatial models Vecchia-Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data This work was supported by JSPS KAKENHI Grant Numbers 17H02046, 18H03628, 20K13261. Also, these research results were obtained from the research commissioned by the National Institute of Information and Communications Technology (NICT), Japan. The SAL transformation in each ! ∈ {1, … , '}, it can be replaced with other transformation functions. The transformation functions considered in this study are summarized in Table B1 .