key: cord-0869051-x35haoyb
authors: Alshaabi, Thayer; Dewhurst, David R.; Bagrow, James P.; Dodds, Peter S.; Danforth, Christopher M.
title: The sociospatial factors of death: Analyzing effects of geospatially-distributed variables in a Bayesian mortality model for Hong Kong
date: 2021-03-24
journal: PLoS One
DOI: 10.1371/journal.pone.0247795
sha: 16bbcb9354a133cd3ed0fdb957bb857ba2fed125
doc_id: 869051
cord_uid: x35haoyb

Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information on the temporal components of this relationship. Using the districts of Hong Kong over multiple census years as a case study, we demonstrate that there are differences in how wealth indicator variables are associated with longevity in (a) areas that are affluent but neighbored by socially deprived districts versus (b) wealthy areas surrounded by similarly wealthy districts. We also show that the inclusion of spatially-distributed variables reduces uncertainty in mortality rate predictions in each census year when compared with a baseline model. Our results suggest that geographic mortality models should incorporate nonlocal information (e.g., spatial neighbors) to lower the variance of their mortality estimates, and point to a more in-depth analysis of sociospatial spillover effects on mortality rates.

Although Hong Kong is a small island territory, it exhibits significant variation in occupations, income, foreign inhabitant density, and residence status of workers. In this study, we examine the benefits and drawbacks of incorporating nonlocal and spatial information into a mortality model for a limited area with restricted publicly available data. Simulating a realistic scenario with limited spatial resolution, we show heterogeneity of such exogenous factors and investigate nonlocal behavioral interactions of prosperity and deprivation across neighborhoods.

We present an analytical evaluation comparing local and nonlocal models to show the importance of spatial associations for mortality modeling. In particular, we apply a spatial network technique to examine socioeconomic nonlocality among communities. For instance, we investigate how the magnitude of a socially deprived area can consequently have a nonlocal effect on its neighbors' mortality risks. Similarly, we delve into how the spatial spread of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

property of an affluent area can spillover to its surrounding areas, and thus affect their longevity. Our work not only reveals the deep influence of these spatial interactions of districts on predicting fatality rates, but also provides a method for investigating systematic inference errors of mortality models.

We structure our paper as follows. We discuss key findings of mortality risk studies in the literature and how they relate to our case examination in the next section. We introduce and analyze our data sources in Sec. 3.1. We summarize the economic and social indicators used in our investigation in S1 File. For our analytical inquiry, we employ a set of Bayesian generalized additive models to predict mortality rates across districts in Hong Kong. We describe our experiments and our exposition of the models in Sec. 3.2. First, we present our local model that does not use any spatial information in Sec. 3.2.1. We compare our Baseline design to two nonlocal spatial models in Sec. 3.3.2. Our first nonlocal model uses spatial features from the nearest neighbours, while the second uses features from all neighbours weighted by their distance to the target area. We will refer to the nonlocal models as SP, and WSP respectively. We show our findings in Sec. 4, highlighting the computational complexity of each method and discussing the benefits and shortcomings of each design. Our evaluation also reveals evidence of sociospatial spillovers of mortality rates. We conclude with some remarks on the limitations of our investigation and potential future work.

There are many studies that delve into the temporal dynamics of mortality risks with respect to nation-wide epidemics [1, 2] , pollution [3, 4] , and life expectancy [5] over the last decade. Researchers have hypothesized and identified several connections of longevity, social deprivation, and socioeconomic discrimination [6] [7] [8] . Notably, there are many interpretations of social deprivation. Messer et al.'s study [9] offers a well-written overview of socioeconomic deprivation in the literature. The authors highlight the limitations of such definitions and propose an alternative method to calculate and standardize what they call a "neighborhood deprivation index" (NDI). Employing principal components analysis (PCA) on census data from 1995 to 2001, they illustrate the effectiveness of their proposed measurement at capturing socioeconomically deprived counties in the US.

Others have investigated a wide range of socioeconomic, psychological, and behavioral factors of fatality risks [10] . We often examine the notion of disparity in health and mortality risks using population-scale inputs and sensitive individual variables such as age, race, and gender respecting the privacy concerns that emerge from such applications [11] . Ou et al. [12] infer socioeconomic status by type of housing, education, and occupation. They find that regions with lower socioeconomic status have higher rates of air pollution. They also report that neighborhoods with higher densities of blue-collar workers have higher rates of air pollution-associated fatality than others. Chung et al. [13] present evidence of inequalities conditioned on age as a control variable. The authors investigate the impact of socioeconomic status amid the rapid economic development of Hong Kong. Their findings suggest a decline in socioeconomic disparity in mortality risks across the distrcits of Hong Kong from 1976 to 2010. They also show that various health benefits brought by economic growth are greater for regions with higher socioeconomic status. The market share of health benefits is unequally distributed among groups of varying status: Individuals with higher socioeconomic status have access to greater benefits than those of lower socioeconomic status. In the present study, we use a set of socioeconomic attributes including income, unemployment, and mobility, to define and capture the some of the ramifications of social deprivation in Hong Kong. 

Spatial associations between income disparity and health risks are widely understood both internationally, and for individual cities and states [14] [15] [16] [17] [18] [19] [20] . Local attributes play a powerful role in the model dynamics, given the assumption that socioeconomic factors vary geographically. Studies have shown the importance of spatial associations in identifying relations between socioeconomic deprivation and longevity. Although researches have examined income inequality, they often use a spatially localized approach in their investigations [21] [22] [23] [24] [25] . Geographically weighted regression (GWR) is a commonly used method designed to examine spatial associations [26, 27] . Fotheringham et al.argue socioeconomic features are intrinsically intra-connected over space because of the mechanisms by which communities develop. Their study makes an empirical comparison of their proposed method (GWR) to other stationary regression models to investigate the spatial distribution of long-term illness in the UK.

Others have looked into the spatial association between air pollution and mortality in Hong Kong [28] , Czechia [29] , Rome [30] , and France [31] . Cossman et al. [32] examine the spatial distribution of mortality rates over 35 years, starting from 1968 to 2002 across all counties in the US. The study highlights a nonrandom pattern of clustering in mortality rates in the US, where high fatality rates are primarily driven by economic decline.

To assess geospatial associations between pollution and mortality in Hong Kong, Thach et al. [33] examine the spatial interactions of tertiary planning units (TPUs) [34] -similar to census-blocks in the US. The authors show a positive spatial correlation between mortality rates and seasonal thermal changes in Hong Kong. They argue that the variation between TPUs is a key factor for cause-specific fatality rates. Their results show that socioeconomically deprived regions have higher fatality rates, especially during winter.

Studying the relative spatial interactions of social and economic indicators dates back to decades ago. Researchers delve into measuring nonlocal and/or interdependent interactions of inequality in life expectancy [35] , health care [36] , education [37, 38] , and decision-making [39, 40] . Many methods have been proposed to identify and examine broader dimensions of inequality from a spatial point of view such as Moran's I and spatial auto-regression [41] [42] [43] . Yang et al. [44] argue that mortality rates of counties in the US are associated with social and economical aspects found in neighboring counties. Their findings suggest that fatality rates in a county are remarkably driven by social signals from bordering counties because of the spillover of socioeconomic wealth or social deprivation across neighborhoods. Another recent work by Holtz et al. [45] highlights the significant influence of nonlocal interactions and spillovers on regional policies regarding the global outbreak of COVID-19. Employing a networkbased approach to explore the dynamics of communities and their impact on mortality risks, we present here a small-case study using a collection of datasets from Hong Kong. In our study, we use three different models to illustrate the role of spatial associations by comparing models with spatial features to a baseline model without spatial factors.

Census data. We collected socioeconomic variables curated by the Census and Statistics Department of Hong Kong [46] . We have three snapshots at 5-year intervals, 2006, 2011, and 2016 . For each year, the dataset includes the total population density by district, median income, median rent to income ratio, median monthly household income, unemployment rates, unemployment rates across households, unemployment rates among minorities, the proportion of homeless people, the proportion of homeless mobile residents, the proportion of single parents, the proportion of households with children in school, the proportion of households with children (aged under 15), and the proportion of households with elderly (aged 65 or above).

Mortality data. We use the official counts of known and registered deaths provided by the Census and Statistics Department of Hong Kong [47] . The data set contains 892,055 death records between 1995 and 2017. Every record includes a wide range of information such as age, gender, and place of residence (TPU) [34] -a geospatial reference system used to report population census statistics. Our mortality records have both place of occurrence and place of residence. Each of those spatial features is certainly important to paint a better picture of the microscopic spatial associations of mortality and other social and economic factors. For our study, however, we only use place of residence as our primary geographic unit and discard all years other than the census years to cross reference our death records with other socioeconomic characteristics for each district. We derive an annual crude death rate of each district by simply dividing the number of registered death records by the population size for each calendar year such that:

for district i and t 2 [2006, 2011, 2016] . Life insurance data. We have obtained data from a Hong Kong based life insurance provider. According to the Hong Kong Insurance Authority [48] , our provider had roughly 2.5% market share of all non-linked individual life insurance policies issued in Hong Kong in 2016. We normalize the number of polices issued at the district level by population size for each time snapshot to report the proportion of individuals insured by each district. Notably, our variable is limited to policies sold by a single company and thus affected by the sociospatial features of the company market share such as the spatial sparsity of its agents and offices, and the social characteristics of consumers who would choose our provider over other life insurance providers in the area. However, statistical and detailed data sources regarding life insurance policies are often proprietary, especially with a similar spatial resolution to the one presented in our study. Although our data on life insurance policies may not represent the full population, it provides an example of the data that an insurance company can use to build their models. Given the scarcity of such data, we use our records of life insurance policies as a useful complementary wealth indicator-among other variables such as income, rent and unemploymentwhich is absent from most studies.

Geospatial unit. Initially, we planned to use TPUs as the main geospatial units to crossreference our data sources. However, we identified a large subset of missing TPUs in death records, as records in small TPUs may reveal sensitive information about specific individuals there. To avoid any risk of identifying individuals in the data set, we use districts as our main spatial unit of analysis [49] . This choice is consistent with prior work, where most studies have either filtered out small TPUs in their analyses [13, 28] , or aggregated their records at the district level [33] to overcome this challenge.

Categorization. We organize our features into three different categories.

1. Base: This set has most of the socioeconomic features in our data sets such as population density, unemployment rates, the proportion of homeless people, mobile residents, and single-parent households. However, we do not include wealth-, age-, or race-related features here.

2. Wealth: Besides the base features described above, we include median income, median rent-to-income ratio, median monthly household income, and life insurance coverage by the district.

This set includes all features in our data set, including sensitive variables-from a sociopolitical perspective-such as the proportion of minorities and unemployment rates among minorities. This set also includes age-related features such as the proportion of young and elderly residents for each district.

There are many statistical modeling paradigms to tackle this task. Each of which comes with its own costs and benefits. Researchers sometimes use Poisson models to investigate mortality risks [1, 4, 28, 33] . Others use general linear or multivariate regression models [31, 35, 50] . We can also see modifications of this family of approaches in the literature such as geographically weighted regression [26, 27] . In this study, however, we consider the simplest approach. We use a set of three Bayesian multivariate linear regression models. Our goal is to examine the addition of nonlocal information, regardless of the distributional assumptions placed on the response variable Y. Therefore, we keep our model as simple as possible to allow us to investigate the implications of two different spatial models compared to a baseline model that does not factor in any nonlocal information. We treat the design tensorsX as exogenous variables and do not model their evolution across time. A "design tensor" is a rank 3 tensor given bỹ

where N is the number of observations which accounts for 18 districts in Hong Kong, and p is the number of predictors. We add an extra variable to the design matrix to account for a constant in our linear model.

The dynamics of the local models are described by a system of linear equations,ỹ

for t = 1, . . ., T. Eq 1 is an ordinary linear model for the response vectorỹ t as a function of the design matrix X t and coefficientsb t . We presently define the quantities that compose Eqs 2 and 3. We set

in Eqs 1 and 2, while w t � Normal(0, 1). Our identity matrixĨ is informed by the number of predictors in our model, and has a dimension of (p + 1) × (p + 1). Hence the model likelihood is

A graphical model corresponding to Eq 5 is displayed in Fig 1. We also a priori believe that b t does not remain constant throughout the time under study, though we are unsure of exactly how it changes over this time. Thus, we assume a prior onb t that evolves as a biased random walk with drift given bym and correlation matrixS with Cholesky decompositionL. Likewise, we suppose that log σ t evolves according to a univariate random walk with drift given by μ and standard deviation ℓ. We make this assumption for the same reason: We do not believe it is likely that σ t remains constant over the entire time period of study. The random walk priors forb t and σ t are each centered about zero because we impose a zero mean prior onm and μ. We initialize these random walks with zero-centered multivariate normal initial conditions,

and

The distribution ofb 1:T � ðb 1 ; :::;b T Þ is thus given by

MultivariateNormalðb tÀ 1 þm;SÞ; ð9Þ with an analogous (univariate) distribution holding for log σ. We set m � MultivariateNormalð0;ĨÞ; ð10Þ

so that the prior distribution over the paths of the regression coefficients and log standard deviation are centered about zero-the null hypothesis-for all t.

We place a uniform prior (LKJ(1)) over the correlation component ofS. The mean of this prior is at the identity matrix. The vector of standard deviations ofS,s, is hypothesized to follow an isotropic multivariate log normal distribution ass � LogNormalð0; 1Þ. We also place a univariate LogNormal(0, 1) prior on ℓ, the standard deviation of the increments of log σ t . We make this choice because we do not possess prior information about the appropriate noise scale ofb t or log σ t and the log normal distribution is a weakly-informative prior that does not encode much prior information about their noise scales. We did not perform exact inference of this model but rather fit parameters of a surrogate variational posterior distribution. We use here variational inference to approximate the posterior probability of the design tensor and ultimately run Bayesian inference over these features [51, 52] . Although traditional methods such as Markov chain Monte Carlo sampling (MCMC) can offer guarantees of accurate sampling from the target density, it does not necessarily outperform variational inference in terms of accuracy [53] [54] [55] . Evaluating the costs and benefits for accurate estimation of posteriors is indeed an open area of research, however, variational inference offers a much faster and effective method to approximate probability densities through optimization, even for small datasets [56, 57] . Using variational inference also allows us to have an agile development cycle and flexible models, as access to more economic and social data features will continue to evolve, and change over time. The effectiveness and versatility of variational inference has left a remarkable positive impact across disciplines [58] [59] [60] [61] [62] .

Denoting the vector of all latent random variables bỹ 

we fit the parameters ψ of an approximate posterior distribution q c ðzÞ to maximize the variational lower bound, defined as the expectation under q c ðzÞ of the difference between the log joint probability and log q c ðzÞ [52] . We chose a low-rank multivariate normal guide with rank equal to approximately ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi dimz p . This low rank approximation allows for modeling of correlations in the posterior distribution ofz with a lower number of parameters than, for example, a full-rank multivariate normal guide distribution. All bounded latent random variables are reparameterized to lie in an unconstrained space so that we could approximate them with the multivariate normal guide.

We use the road networks in Hong Kong to build a spatial network of the 18 districts [63, 64] . Each node in the network represents a single district. Nodes are linked if they share a direct road or bridge. In Fig 2A, we show a map of Hong Kong's districts. We display an undirected network of districts in Fig 2B. By contrast, we show a fully connected version of the network in Fig 2C. Edges are weighted by their spatial distance measured by the length of the shortest path d ij to reach from district i to district j

and weights decays exponentially as the length of the shortest path increases between any two districts.

Similarly, we treat the adjacency matrixÃ as an exogenous variable. We fit two nonlocal models that leverage the design matrix associated with each district's neighbors; a spatial model (SP) that uses the binary adjacency matrix and a weighted spatial model (WSP) that uses the weighted adjacency matrix. The equations describing the time evolution of this data generating process areỹ t ¼X tbt þf ðX t �BÞg t þ s tũt ; ð14Þ b t ¼b tÀ 1 þm þLṽ t ; ð15Þ 

for t = 1, . . ., T. The rank three tensorX t �B is the outer product ofB �Ã ÀĨ with the design matrixX t . The functionf is a reduction function that lowers the rank of the tensor by one by collapsing the first dimension. Here we takef to be the mean across the first dimension.

In other words,f ðX t �BÞ is a design matrix wheref ðX t �BÞ ij is the average of the values of predictor j over all the neighbors of district i in the network.

The prior distributions forg t ,m,L, andq t are identical to those forb t ,m,L, andũ t except their dimensionality is lowered from p+ 1 to p since we do not include an intercept term iñ f ðX t �BÞ.

We use Pyro [65] , a probabilistic programming language that operates on top of Pytorch [66] , a dynamic graph differentiable programming library, to implement our models. Our source code along with our documentation is publicly available online on our Gitlab repository at: https://gitlab.com/compstorylab/asis.

In Fig 3, we display the spatial distribution of socioeconomic characteristics for 2006, 2011, and 2016. Each heatmap is normalized by the mean and standard deviation for each year, such that darker shades of red show areas above the mean for each of these variables, while shades of grey show areas below the mean. We show normalized population density in Fig 3A through  3C . We see dense clusters both at the center of the country and on the northwestern side.

We are primarily interested in the geospatial trend across different variables/predictors for each year, respectively. For example, our heatmaps in Fig 3D-3F show that the southern islands have higher mortality rates than the average rates of Hong Kong. The southern islands had higher rates of new life insurance policies in 2006 (Fig 3G) , followed by consistently lower rates than average when compared to the rest of the districts in Hong Kong for each year, respectively (see Fig 3H and 3I) .

The northwestern territories have higher rates of unemployment compared to the southeastern side of Hong Kong, as we see in Fig 3J-3L . In Fig 3M-3O , we observe that the east and center districts have higher normalized median income when adjusted for inflation. We display additional statistics regarding households in S2 Fig in S1 File.

Although we have similar and simple building blocks for our models, they do scale differently in terms of their computational costs. The total number of parameters in our Baseline model is equal to pN+ 1, where p is the number of predictors we use for each district. Besides the set of predictors for each target district, our spatial model SP uses spatial features from the nearest neighbours (ego-network) to that district. Thus, the total number of parameters used in the SP model is / pN � C where � C is the average clustering coefficient of the network. Our WSP model leverages features from all neighbours weighted by their distance to the target district. It has the largest number of parameters, which is proportional to pN 2 . The relative difference in the number of parameters among these models urges us to further investigate the benefits of expanding the models with spatial features.

To evaluate our models, we consider two metrics: mean absolute error (MAE) to estimate our margin of error and mean signed deviation (MSD) to examine systematic bias. In Table 1 , we report the mean absolute error defined such that:

for each calendar year t 2 [2006, 2011, 2016] in our dataset across all districts. We highlight cells in blue to show the model with the lowest margin of error, and red to indicate the best model for all years. We also color cells in grey to demonstrate a tie between two models for a year. For our default set of features (Base), we note our WSP model outperformed the rest of the models in most districts. However, as we add more predictors to the models, we observe a pattern whereby models with fewer parameters perform better. Our results show the SP model has the lowest MAE across districts when we use the Wealth category, which has 11 predictors including some wealth-related features as described in Sec. 3.1. The Baseline model-which does not account for spatial information-has a lower MAE when we feed all predictors to the models. This is an expected behavior because our larger and richer spatial models get overwhelmed with too many parameters and very limited data points. We show a detailed breakdown for each model and each district for the calendar year 2016 in Table 2 .

We also compute a probability density function (PDF) of signed deviation (Ŷ t i À Y t i ) for each model, which is possible since our models are fully Bayesian and generate a distribution of possible outcomes. If the models accurately associate features with observed mortality rate, the distributions would be centered on zero. Conversely, if the models display systematic overor under-estimation of mortality rate, the distributions will diverge away from zero, whereby negative numbers show underestimation and positive numbers indicate overestimation.

In Fig 4, we display the empirical distributions of signed deviation to examine the relative likelihood of systematic bias for models trained on the default set of features in 2016. We assess significance of model coefficients using centered Q% credible intervals (CI). A centered Q% credible interval of a probability density function p(x) is an interval (a, b) defined such that

We measure the significance of systematic errors in each model by computing the 80% CI, whereby systematic overestimation is highlighted in orange (CI > 0), and systematic underestimation is colored in blue (CI < 0).

We note that our spatial models, especially the weighted spatial model, are effective at reducing systematic over-and under-estimations. For example, the spatial models reduce the margin of error in panels B, H, L, and R of Fig 4. By contrast, all three models either overshoot or undershoot mortality rates drastically in a few districts (see panels D, E, M, and Q in Fig 4) .

Our models also provide evidence to suggest that there are significant relationships between socioeconomic variables, such as household unemployment, percentage of single parents, and mortality rate. Many of these relations are significant in each of the census periods under study (2006, 2011, and 2016 ) while other relations are significant for some census periods but Table 1 . Model evaluation. We report the mean absolute error for each model across all districts averaged over a 1000 trials. The cells colored in blue show the model with the lowest margin of error for each feature category, whereas grey cells demonstrate ties among two models. The model highlighted in red indicates the model with the lowest margin of error. not others. We display the distributions of βs in S6 Fig in S1 File-the parameter for the baseline model. For each panel, we show the kernel density estimation of β as a function of each variable in the design tensor. We highlight distributions that are significantly above 0 using an orange color, while distributions significantly below 0 are colored in blue as measured by the 80% CI. Similar demonstrations of spatial and weighted spatial models can be found in S7 and S8 Figs in S1 File respectively. Besides the distributions of βs, we also show the kernel density estimation of γs-the hyperparameter used for the spatial competent in each model in S9 Fig  in S1 File. All three models are fairly accurate nonetheless. Access to a wider range of predictors and longitudinal data will help reduce our margin of error in estimating mortality rates. However, the spatial models allow us to capture nonlocal and interdependent interactions among the social and economic features across districts that would not be possible otherwise using the baseline model.

e t ¼ 1 N P 18 i¼1 jŶ t i À Y t i j

The districts of Sai Kung, Sha Tin, Wong Tai Sin, and Southern are poorly fit by our models in 2016. In Fig 5, we inspect the characteristics of each district and its neighbors. The first three rows show how all three models overestimate mortality rates for the Sai Kung and Sha Tin districts (Fig 5A and 5B) , while underestimating the other two districts (Fig 5C and 5D) . For each district, we plot the standardized score of some socioeconomic features (black markers). We also display the corresponding average value of the same features from the neighboring districts derived from our network and denoted with orange markers to examine the effect of neighborhoods on these areas. The first two districts (Fig 5A and 5B ) have lower mortality rates than average mortality of Hong Kong. They are significantly overestimated by our models, while mortality rates in their neighboring districts hover around average mortality rate of Hong Kong. The other two districts (Fig 5C and 5D ) have higher mortality rates and surrounded by districts with lower mortality rates. The qualitative difference among these districts provides additional evidence of sociospatial and economic spillovers in mortality risks. We can see spillovers of wealth in Fig  5A and 5B with a higher median income in the district being associated with a higher median income in its neighborhood. Though districts in Fig 5C and 5D are located in a wealthy neighborhood, they still have a lower median income. The Wong Tai Sin district has a lower number of mobile residents than average (see Fig 5C) . In Fig 5D, we see an extraordinary higher number of homeless people compared to the neighboring population. A higher percentage of minorities can also drive our systematic errors in this district, which hints at disparities in mortality risks in the area. We would need further analyses with finer geospatial resolution to explain this behavior.

We also analyze our signed deviation distributions through a local ego network approach. An ego network of a node is the network comprising that node and its nearest neighbors. Here each node is a district and its neighbors are that district's neighboring districts. The joint distribution of mortality at time t and district i and the model mortality rate prediction conditioned on location at node i can be concisely represented in the form of an ego network for each district. We display an example of this representation in Fig 6. We color nodes by standardized mortality rate and edges by standardized signed deviation error. Though this is an exploratory method that deserves greater expansion and attention in future work, we note qualitatively that the local view of predicted mortality versus true mortality varies substantially as a function of district. For example, the district of Wan Chai is connected to four other districts, three of which have substantially higher mortality than average and in particular higher mortality than Wan Chai itself, which has lower mortality than average. The model predictions for these neighbors are lower than their true mortality. An observer in Wan Chai who has access only to the model predictions and not the true mortality data would rationally assume that mortality in these districts is much lower than it truly is and could subsequently make further inferences or decisions based on faulty-but-rational assumption. Notably, socioeconomic diversity or heterogeneity of a neighborhood can change the local perception of mortality models across neighboring communities. We observe that high divergence from the neighborhood is associated with higher rates of uncertainty in the models. We envision future work incorporating this sort of information to fine tune mortality models.

Data-driven models are powerful tools often used to inform and reshape cultural, political, and financial policies around the globe. However, data scarcity and data sparsity pose an enormous challenge for some domains such as mortality modeling, especially for small territories. In this work, we studied the implications of that on the development of mortality models in Hong Kong with restricted access to publicly available data sources. We carried out a set of experiments to identify and explore how nonlocal and sociospatial interactions can systemically influence the outcome of a mortality model.

Our results support our hypothesis that spatial associations of wealth or social deprivation among neighborhoods have a direct and sometimes substantial impact on mortality risks. Our   Fig 6. Ego networks of each district demonstrating sociospatial factors of mortality for the 2016 weighted spatial model. We display ego networks of each district in Hong Kong and its nearest neighbors in the road and bridge network. The central node (highlighted with a grey box) of each network corresponds to the labelled district. Neighbors are not arranged around the ego district geographically. Node color corresponds to normalized mortality rate and edge color corresponds to signed prediction error for the 2016 WSP model. These ego networks encode a qualitative measure of the sociospatial factors in mortality modeling. We display the equivalent networks for the Baseline and SP models in S10 and S11 Figs in S1 File respectively. https://doi.org/10.1371/journal.pone.0247795.g006 examination reveals that localized models-which do not account for sociospatial factors-can systematically over-or under-estimate mortality rate-while spatial models reduce the error of predicting mortality rate. In our investigation, we show how our models scale differently regarding their complexity and statistical inference. We illustrate how the local perception of predicted mortality varies qualitatively and substantially as a function of the spatial unit. Future work can also improve upon our exploratory method to study spatial interdependency of social and economic factors of longevity, and identify sociospatial spillovers across neighbourhoods and communities.

We acknowledge our findings are limited for a few reasons. We only have access to census data for three individual years spanning a decade and a half. A better explanation of the nonlocalized effect of neighborhoods could be achieved by testing our hypotheses on additional years, along with more socioeconomic features to enrich our design tensor.

We used variational inference to estimate the posteriors analytically because of its effectiveness and versatility. Although our decision of using variational Bayes is substantive, future studies can further explore and examine the costs and benefits for accurate estimation of posteriors using variational inference compared with other classical Bayesian methods. Varying the model parameters temporally ensures a modular design, as access to richer and longitudinal sociotechnical data features continues to evolve. However, our evaluation suggests that using fixed parameters over time can reduce the number of tuneable parameters in the models substantially to overcome the challenge of high bias-variance tradeoff of the spatially rich and larger models.

Our geospatial resolution is unfortunately not high enough to identify some dynamics of connected communities. Our method, however, can be implemented similarly regardless of the spatial unit used for the experiment. Our spatial network is mainly based on the road network of Hong Kong, which could be extended to account for roads/bridges and public transport across any desired spatial unit.

Finally, we have only explored a distance-based weighting scheme for the connections across districts in the network. Population density could be included to enrich the socioeconomic effect of neighboring regions on a node within the network (for example, theory of intervening opportunities [67] ). Other attributes such as geographic information associated with community health services could help us assess their value and reallocate these centers to more optimized locations.

Supporting information S1 File. (PDF)

Breast cancer incidence and mortality in a transitioning Chinese population: Current and future trends

A joint analysis of influenza-associated hospitalizations and mortality in Hong Kong

Effect of air pollution on daily mortality in Hong Kong. Environmental Health Perspectives

Air pollution and mortality: Effect modification by personal characteristics and specific cause of death in a case-only study

Leisure time physical activity and mortality in Hong Kong: Case-control study of all adult deaths in 1998

Inequality in men's mortality: The socioeconomic status gradient and geographic context

Mortality, inequality and race in American cities and states

Hé mon D. Ecological association between a deprivation index and mortality in France over the period 1997-2001: Variations with spatial scale, degree of urbanicity, age, gender and cause of death

The development of a standardized neighborhood deprivation index

Predicting mortality from 57 economic, behavioral, social, and psychological factors

How differential privacy will affect our understanding of health disparities in the United States

Socioeconomic disparities in air pollution-associated mortality

Socioeconomic disparity in mortality risks widened across generations during rapid economic development in Hong Kong: An age-period-cohort analysis from 1976 to 2010

Sociodemographic and morbidity indicators of need in relation to the use of community health services: Observational study

Is inequality at the heart of it? Cross-country associations of income inequality with cardiovascular diseases and risk factors

An examination of the relationship between neighborhood income inequality, social resources, and obesity in Los Angeles county

The role of geographic scale in testing the income inequality hypothesis as an explanation of health disparities

Tract-and county-level income inequality and individual risk of obesity in the United States

Peer Reviewed: Geographic Association Between Income Inequality and Obesity Among Adults

Modeling the Importance of Within-and Between-County Effects in an Ecological Study of the Association Between Social Capital and Mental Distress. Preventing Chronic Disease

The relationship of income inequality to mortality: Does the choice of indicator matter?

Relation between income inequality and mortality in Canada and in the United States: Cross sectional assessment using census data and vital statistics

Neighborhood socioeconomic deprivation and mortality: NIH-AARP diet and health study

Income inequality and income segregation

Exploring the inequality-mortality relationship in the US with Bayesian spatial modeling

Geographically weighted regression: A natural evolution of the expansion method for spatial data analysis

Geographically weighted regression: The analysis of spatially varying relationships

The effects of air pollution on mortality in socially deprived urban areas in Hong Kong

Association between unemployment, income, education level, population size and air pollution in Czech cities: Evidence for environmental inequality? A pilot national scale analysis

Socioeconomic status, particulate air pollution, and daily mortality: Differential exposure or differential susceptibility

Air quality and social deprivation in four French metropolitan areas-a localized spatio-temporal environmental inequality analysis

Persistent clusters of mortality in the United States

Assessing spatial associations between thermal stress and mortality in Hong Kong: A small-area ecological study

Tertiary Planning Units

Quantifying and explaining variation in life expectancy at census tract, county, and state levels in the United States

Measuring socioeconomic inequality in health, health care and health financing by means of rank-dependent indices: A recipe for good practice

Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management

Peer effects in education: How might they work, how big are they and how much do we know thus far? In: Handbook of the Economics of Education

Conforming and non-conforming peer effects in vaccination decisions

The teenage brain: Peer influences on adolescent decision making. Current Directions in Psychological Science

Notes on continuous stochastic phenomena

Beyond Moran's I: Testing for spatial dependence based on the spatial autoregressive model

Social capital and human mortality: Explaining the rural paradox with county-level mortality data

Exploring geographic variation in US mortality rates using a spatial Durbin approach. Population, Space and Place

Interdependence and the cost of uncoordinated responses to COVID-19

Population Census Data for Hong Kong

Micro-data set of known and registered deaths in Hong Kong

Annual Long Term Business Statistics

District and Constituency Area

A spatial approach for the epidemiology of antibiotic use and resistance in community-based studies: The emergence of urban clusters of Escherichia coli quinolone resistance in Sao Paulo

Variational algorithms for approximate Bayesian inference

Stochastic variational inference

Variational inference for large-scale models of discrete choice

Black Box Variational Inference

Automatic differentiation variational inference

Automatic variational inference in Stan

Variational inference: A review for statisticians

Propagation algorithms for variational Bayesian learning

Bayesian statistics and marketing

Bayesian reasoning and machine learning

Advances in variational inference

Optimization methods for large-scale machine learning

Empirical determination of geometric parameters for selective omission in a road network

Vulnerability analysis for large-scale and congested road networks with demand uncertainty

Deep Universal Probabilistic Programming

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Intervening opportunities: A theory relating mobility and distance

The authors are thankful for the computing resources provided by the Vermont Advanced Computing Core, and the Hong Kong's Census and Statistics Department for facilitating access to their mortality dataset. The authors are grateful for useful conversations with Adam Fox, Marc Maier, and Xiangdong Gu. We also thank Melissa Rubinchuk, Josh Minot, Michael Arnold, Anne Marie Stupinski, Colin Van Oort, and many of our colleagues at the Computational Story Lab for their discussions and feedback on this project.