key: cord-0058893-lhg7o5ci authors: André, Lídia Maria; Figueiredo, Ivone; Carvalho, M. Lucília; Simões, Paula; Natário, Isabel title: Spatial Modelling of Black Scabbardfish Fishery Off the Portuguese Coast date: 2020-08-24 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58799-4_25 sha: 4a63ef252d9ddc9b52c30a3ebeb19dcc5fa5437c doc_id: 58893 cord_uid: lhg7o5ci The Black Scabbardfish is a deep-water fish species that lives at depths greater than 700 m. In Portugal mainland, this is an important commercial resource which is exploited by longliners that operate at specific fishing grounds located off the coast. The monitoring of the population status mainly relies on the fishery data as no independent scientific surveys take place. The present work focus on modelling the spatial distribution of the BSF species relative biomass. Georeferenced data given by the location of the fishing hauls and the corresponding catches are available for a set of different vessels that belong to the longline fishing fleet. A classical geostatistical approach was applied to fit a variogram and evaluate the isotropy of the data. Then, different regression models with fixed, structured and unstructured random effects were fitted under a Bayesian framework, considering the Stochastic Partial Differential Equation (SPDE) methodology under the Integrated Nested Laplace Approximation (INLA), addressing some practical implementation issues. The models with spatial effects seemed to perform better, although some practical constraints related to the considered covariates hindered the choice. Expand knowledge about biodiversity and species abundance is an important scientific challenge. One way of obtaining information about fish assemblages, e.g. depth and geographical distributions of species, their density, diversity and effect of fishing, is through research surveys as they provide fishery independent information. For the last 20/30 years, most surveys in Western European waters focused on the study of fishery resources that lived at the continental shelf or upper slope [3] . For monitoring populations of deep-water species, however, the surveys are scarce. In Portugal, the spatial distribution and abundance of Black Scabbardfish (BSF), which is an important commercial fishery for the country, is mostly unknown. BSF is a deep-water fish species that lives at depths greater than 700 m [6] and, therefore, the monitoring of the population status mainly relies on fishery data as no independent scientific surveys take place [14] . In Portugal mainland, it is exploited by longliners that operate at specific fishing grounds along the coast [2] . The distribution of a given species in a particular region as well as the corresponding patterns occur in space. In order to study the natural structure and to understand the functional processes behind them, identifying the relevant spatial dimension at which these all occur is required. Usually, the biological and environmental phenomena can be monitored and measured only at a limited number of spatial locations, even if it is defined continuously over a region. Due to that, geostatistical models focus on inferring a continuous spatial process based on data observed at a finite number of locations, which means that the process which determines the data-locations and the process modelled are stochastically independent. Although some aspects of the spatio-temporal modelling of species distribution has been discussed, for example in [12] , the inclusion of a spatial tendency has not been considered before for the BSF species. Therefore, the present work focus on modelling the spatial distribution of the BSF species relative biomass along the Portuguese coast. Georeferenced data given by the location of the fishing hauls and the corresponding catches are available for a set of different vessels that belong to the longline fishing fleet. A preliminary geostatistical analysis was performed using geoR [4] and the study of the isotropy of the data was performed with spTest [18] . RGeostats [13] was also used to reassure that the fitted variogram was the optimal. Modelling was considered under a Bayesian framework due to its flexibility and because the uncertainty in the model parameters is taken into account. Although extensively used for Bayesian inference, Markov Chain Monte Carlo methods (MCMC) have some computational issues in this context. When dealing with spatial models they may be extremely slow or unfeasible [1] . An alternative method is the Integrated Nested Laplace Approximations approach (INLA) proposed by [16] , which is a deterministic algorithm rather than simulation based such as MCMC. Since these methods have proven to provide accuracy and fast results, they were used, mainly through the package R-INLA. INLA is particularly suited for geostatistical models where it is assumed that a continuous spatial surface underlies the observations, which is very well captured through the Stochastic Partial Differential Equation (SPDE) method proposed by [8] . This methodology represents a continuous Gaussian Field (GF) using a discretely indexed Gaussian Markov Random Field (GMRF), considering a SPDE whose solution is a GF with a Matérn covariance where s i − s j is the Euclidean distance between two locations s i , s j ∈ R 2 , σ 2 is the marginal variance, r is the range and K λ denotes the modified Bessel function of the second kind and order λ > 0; see [1] . Although the solution is a Markov indexed random field when −1 ≤ λ ≤ 1, for R-INLA only λ = 1 was fully tested; see [7] . In spite of its flexibility, the SPDE approach with INLA has also some issues. For instance, the smoothness parameter λ is fixed at 1, which is somehow restrictive for model fit. There are also some concerns regarding the parameterisation of the range parameter. While in conventional geostatistic applications, the range is the point where the correlation reaches 0.05, in R-INLA, the range is the point where the correlation is 0.01. Moreover, this parameter in R-INLA is related to a scale parameter κ through the following equality r = √ 8λ κ . Therefore, special attention has to be given when comparing results from different geostatistical packages, such as geoR or RGeostats. The main fault concerns the choice of the prioris for the hyperparameters of the models and, consequently, good model specifications. In fact, the use of the default priors resulted in a poor fit of the models. This paper is organised as follows: in Sect. 2 the data are described, in Sect. 3 the models are detailed, results are presented in Sect. 4 and discussed in Sect. 5. The data for this study were provided by the Instituto Português do Mar e da Atmosfera (IPMA) and correspond to the BSF catches (in Kg) by fishing haul of the longline fishing fleet, in the fishing grounds of the South zone of Portugal, from 2009 to 2013. The data were divided into two semesters, with semester 1 including the months from March to August and semester 2 from September to February of the following year. Moreover, since the BSF catches are quite higher during the months corresponding to the second semester, the analysis has considered only these semesters. The data also include the location of each fishing haul, Fig. 1 , the corresponding vessel identification and the depth (DEPTH) out of the locations where the fish was captured. Later, the vessels were grouped into three levels according to their tonnage (PRT) that relates to the cargo capacity. The left panel of Fig. 2 displays the sample distribution of the BSF catches. In order to meet the Gaussian assumption required by the traditional geostatistical methods, a Box-Cox transformation of the catches with λ = 1/2 was used (right panel of Fig. 2) : (2) The Kolmogorov-Smirnov test for normality of BSF* accepted this hypothesis, resulting in a p-value of 0.486. If data are isotropic, such that the dependence between sampling locations only relies on the corresponding distance and not on orientation, it greatly simplifies geostatistical models. This has been tested using the non-parametric test proposed by [11] in the R-package spTest [18] . The test is for the null hypothesis which is isotropy, i.e., is a full rank matrix [9] and γ(·) is the semivariogram evaluated at any two spatial lags. First, one has to choose the lags Λ at which anisotropy must be tested. Although this is a subjective choice, it is recommended to choose short lags and to contast points of orthogonal lags. Therefore, The test statistic is given by where b 2 n is a normalising constant, G n is a vector of the true values of the semivariogram at the chosen lags,Ĝ n is the vector of the point estimates of the semivariogram at the chosen lags, Σ is the asymptotic variance covariance matrix ofĜ n andΣ its estimate. The p-value can be obtained from the asymptotic χ 2 distribution with degrees of freedom given by the row rank of A. A grid based block bootstrap approach is applied to estimate Σ. This methodology creates a spatial permutation of sampling locations' blocks [19] . Therefore it is required to choose the window or block size, which can lead to different conclusions as the performance of the test is sensitive to these choices. Moreover, for an irregularly shaped sampling domain, as it is the case, incomplete blocks may complicate the subsampling procedure. Following the recomendations in [18, 19] , blocks with dimensions 8.342 × 15.200 were chosen and a p-value of 0.960 was obtained, i.e., the isotropy hypothesis was not rejected. Let observations BSF * i , i = 1, · · · , n, be modelled as where n is the total number of fishing hauls, σ 2 e is the nugget effect and η i is the response mean. Several linear models were considered to fit the data and variables tonnage (PRT) and depth (DEPTH) were taken as fixed or random effects. The inclusion of a spatial tendency was done through the SPDE approach presented before where an approximated latent GF ξ(s) has a Matérn covariance function as defined in (1). The eleven models considered are presented below and summarised in Table 1. 1. Tonnage is taken as categorical, i.e., where M is the number of groups of PRT. 2. Tonnage is a random effect with a sum to zero constraint, i.e., 3. Depth is taken as a fixed effect, i.e., 4. Tonnage and depth are fixed effects, i.e., 5. Depth is taken as a fixed effect whereas tonnage is a random effect with a sum to zero constraint, i.e., 6. An approximated latent GF ξ(s) with Matérn covariance function is included, i.e., where {ξ g } are zero mean Gaussian distributed weights and A ig is the value ϕ g (s i ) of the g-th deterministic basis function evaluated in s i ; A ig is also the generic element of the sparse n × G matrix A that maps the GMRFξ(s) from the G triangulation vertices of a mesh to the n observation locations; see [1] . 7. Tonnage is categorical and ξ(s) is an approximated latent GF, i.e., 8. Tonnage is taken as random effect, with a sum to zero constraint, and ξ(s) is an approximated latent GF, i.e., 9. Depth is a fixed effect and ξ(s) is an approximated latent GF, i.e., 10. Tonnage and depth are fixed effects and ξ(s) is an approximated latent GF, i.e., 11. Tonnage is taken as random effect with a sum to zero constraint, depth is a fixed effect and ξ(s) is an approximated latent GF, i.e., The analysis started by considering the default priors of R-INLA, i.e., β m , γ ∼ N (0, 0.001 −1 ) and τ e = 1/σ 2 e , τ PRT ∼ Gamma(1, 0.00005). However, an extremely high precision for the random effect, τ PRT , was estimated. This indicated that this parameter is very sensitive to the choice of the prior since that huge variations on the posterior mean were obtained considering different priors. Following [5] , slightly more informative priors on the precision τ PRT were used. However, different priors were considered for this parameter and, in order for the model to converge properly, that is for the INLA algorithm to be able to find the correct posterior mode for the Laplace approximation, the values for the corrected standard deviation for the hyperparameters, which appear in the verbose mode of the model in R-INLA, have to be near the value 1 (between 0.5 and 1.5) [15] . Therefore, for models 2 and 5 (Eqs. (6) and (9), respectively), a truncated normal prior with precision 0.1 was chosen. As for models 8 and 11 (Eqs. (12) and (15) , respectively), a Gamma with parameters 0.5 and 0.5×Var(residuals Model6 ) was assumed. For the spatial parameters, the Penalised Complexity Priors (PC Priors) introduced by [17] were used. These priors are a way to overcome the problem of overfitting and intend to penalise the deviation from a base model, where r = ∞ and σ 2 = 0, by shrinking towards it. Moreover, they do not depend on the spatial design neither change if data are made available at new locations. The PC prior for the range was defined according to some aspects, such as the histogram and the median of the distance between the coordinates of the observations, which was approximately 49 km [20] , and the practical range obtained in the Matérn variogram, which was approximately 20 km. Therefore, and in order to obtain a proper convergence of the model, P [r < 30] = 0.2 was considered the PC prior for the range. In respect of the PC prior for the marginal standard deviation, P [σ > 10] = 0.01 was applied taking into account the variance obtained in the fitted variogram, which was approximately 127.3 ( √ 127.3 ≈ 11). The SPDE approach for spatial modelling relies on a finite element method to approximate a GF by a GMRF, which is computationally feasible. This approximation lies on a triangulation of the spatial domain through a mesh in which basis functions are defined for the method. The construction of the mesh is then an important point of the process. It has to be fine enough and computationally feasible, thus some considerations have to be taken into account. For instance, the triangles in the inner domain have to be as regular as possible and extremely small triangles must be avoided. For these motives, the maximum allowed triangle edge length in the inner domain was 1/3 of the range of the study area and five times more for the outer extension. Moreover, the minimum allowed distance between points was 1/5 of the triangle edge length of the inner extension [20] . The resulting mesh is represented in Fig. 4 . The preliminary analysis was done in geoR, which does not provide a function to find the optimal variogram. In fact, the choice has to be done by eye. A way to overcome this issue is to resort to RGeostats package, which has a function to determine the optimal variogram. In this case, the optimal variogram for the data obtained was a combination of a Cubic and Spherical covariance families. However, in order to be able to compare the results with the analysis undertaken in R-INLA, the smoothness parameter had to be fixed to 1 and the covariance family had to be Matérn. Yet, when comparing the optimal variogram with the Matérn variogram with the parameter fixed to 1, these were very similar, differing mainly in the nugget effect and in the cubic part of the optimal variogram, as can been seen in Table 2 and in Fig. 5 . The described models in Sect. 3 were fitted in R-INLA and the Deviance Information Criterion (DIC) and the Watanabe-Akaike Information Criterion (WAIC) were used as comparison measures. Results summarised in Table 3 . The values for the criteria measures DIC and WAIC for Models 7 (Eq. (11)), 8 (Eq. (12)), 10 (Eq. (14) ) and 11 (Eq. (15) ) are very similar. Despite that, the model with spatial effect and the variable tonnage as a random effect (Model 8 Eq. (12) ) has the lowest values and, therefore, seems to be the best model to fit the data. Furthermore, the variable depth proved to be not relevant considering a 95% credible interval for its effect. This work is a first step in modelling the BSF fishery along the Portuguese coast. The spatial nature of the data was perceived and modelled but, being these fishery data, it is most likely that they suffer from preferentiality of sampling locations, reflecting the fishermen preferences. Incorporating this in analysis is the next step of this study. A few problems arose while adjusting some models and, although they have been addressed in the best way possible, the results presented should be seen with caution. One of the problems concerns the construction of the mesh and how to define the arguments of the function inla.mesh.2d(). This is a problem because, if the mesh is not fine enough, the boundary extension is not big enough or the triangles are not as regular as possible in shape and size, results may not be accurate. On the contrary, the more precise is the mesh, the more computational effort is required. Another issue is related to the model with spatial effect and a categorical variable. In order to map the latent field, the function inla.stack() is used. However, to the best of our knowledge, inla.stack() does not support categorical variables. Several attempts were performed to overcome this problem. First, the categorical variable was incorporated directly in the model and no intercept was defined, which implied that the first level works as the intercept. Then, a design matrix with dummy variables was created. The latter approach was the selected one as it produced the best results. Also, there was a problem concerning the priors. The variable PRT, either as a categorical variable or taken as a random effect, was found to be highly sensitive to the choice of the priori, both in terms of values obtained for the posterior mean and in terms of obtaining a proper convergence of the model. Furthermore, regarding the PC Priors, little variations on the initial values for the range and/or standard deviation caused large variations on the values of the posterior means for those parameters. In fact, when checking the verbose, it was especially difficult to find correct posteriors for the precision of the random effect, for the standard deviation and for the range hyperparameters. These must be better addressed in the future as this is a work in progress. Finally, only linear models were considered. Inspecting the plots of the residuals against the fitted values, the observed relation given by a non-parametric smoother did not justify fitting non-linear models [10] to these data. Spatial and Spatio-Temporal Bayesian Models with R-INLA The fishery for black scabbardfish (aphanopus carbo lowe, 1839) in the Portuguese continental slope Report of the international bottom trawl survey working group Model-Based Geostatistics Black scabbardfish (Aphanopus carbo Lowe, 1839) in the southern northeast atlantic: Considerations on its fishery Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach Testing for directional symmetry in spatial dependence using the periodogram On spline-based approaches to spatial linear regression for geostatistical data Testing for spatial isotropy under general designs Species distribution modeling: a statistical review with focus in spatio-temporal issues MINES ParisTech / ARMINES: RGeostats: The Geostatistical R Package. Free download from A state space model approach for modelling the population dynamics of black scabbardfish in Portuguese mainland waters R-inla discussion group Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations Penalising model component complexity: a principled, practical approach to constructing priors spTest: an R package implementing nonparametric tests of isotropy A review of nonparametric hypothesis tests of isotropy properties in spatial data Begginer's Guide to Spatial. Temporal and Spatio-Temporal Ecological Data Analysis with R-INLA Acknowledgments. This work was supported by national funds through FCT -Fundação para a Ciência e a Tecnologia, I.P., through the project "PREF-ERENTIAL: Improving Spatial Estimation and Survey Design Through Preferential Sampling in Fishery and Biological Applications" with reference PTDC/MAT-STA/28243/2017, project UIDB/00297/2020 (Centro de Matemática e Aplicações) and project UIDB/00006/2020 (Centro de Estatística e Aplicações da Universidade de Lisboa).