key: cord-0598562-lreu4u6a
authors: Williams, Justin R.; Crespi, Catherine M.
title: Causal inference for multiple continuous exposures via the multivariate generalized propensity score
date: 2020-08-31
journal: nan
DOI: nan
sha: 0b045e70962de738e81ea63bb91aa1a337a9f2d7
doc_id: 598562
cord_uid: lreu4u6a

The generalized propensity score (GPS) is an extension of the propensity score for use with quantitative or continuous exposures (e.g., dose of medication or years of education). Current GPS methods allow estimation of the dose-response relationship between a single continuous exposure and an outcome. However, in many real-world settings, there are multiple exposures occurring simultaneously that could be causally related to the outcome. We propose a multivariate GPS method (mvGPS) that allows estimation of a dose-response surface that relates the joint distribution of multiple continuous exposure variables to an outcome. The method involves generating weights under a multivariate normality assumption on the exposure variables. Focusing on scenarios with two exposure variables, we show via simulation that the mvGPS method can achieve balance across sets of confounders that may differ for different exposure variables and reduces bias of the treatment effect estimates under a variety of data generating scenarios. We apply the mvGPS method to an analysis of the joint effect of two types of intervention strategies to reduce childhood obesity rates.

Analyzing data from observational studies and non-randomized experiments to estimate causal effects is challenging due to potential confounding between the exposure and outcome. Techniques such as propensity score methods can help to remove sources of potential confounding and return valid estimates of the treatment effect [39] . Propensity score methods were first developed for binary treatments, and then extended to categorical, or multiple, treatments, which introduced the term "generalized propensity score" (GPS) [21] . In the context of a categorical treatment variable, the GPS corresponds to the conditional probability of receiving a particular treatment given a set of confounders. Following the extension to categorical treatments, the GPS was adapted to the setting of continuous exposures via the use of conditional densities [16, 20] .

Originally, the GPS for continuous exposures was estimated using Gaussian densities, with adjustment for confounding accomplished through either covariate regression [16] or stratification [20] . Several recent methods have aimed at increasing flexibility for estimating the GPS by using gradient boosting [54] , kernel smoothing [9, 24] , or ensemble algorithms [27] . Other methods focus on the use of weights [37] , including incorporating covariate balancing properties in estimation using a penalized likelihood approach [10] or constrained optimization on the entropy of the weights [45, 49] . All of these methods have maintained the assumption that the exposure is univariate, i.e., a single continuous treatment variable. Methods to accommodate multiple simultaneous treatment exposures have been only briefly mentioned in the literature [20] .

There are many situations in which evaluating the combined effect of multiple simultaneous exposures is critical to answering scientific questions. In medicine, combination therapies, which involve the patient taking several medi-cations simultaneously, have been shown to be effective for treating Crohn's disease [6] , cancer [22] , hypertension [13] , and HIV [34] . Typical methods for estimation of the dose response surface for combination treatments requires careful design with repeated randomized experiments in order to estimate the optimal combination of treatment doses [25] . When such experimentation is not feasible, researchers may wish to use available data from observational or non-randomized studies to estimate the joint effects. For example, there is currently interest in studying potential combination therapies for COVID-19 using available data from non-randomized studies. However, determining a potentially beneficial dose of several medications may be complicated due to confounding by patient demographic characteristics, comorbities or other factors [11, 12, 43, 44] .

In this article we develop methods to estimate the causal effects of multiple continuous exposures occurring simultaneously, using data from a non-randomized study. We develop a general framework for estimating the causal effects of a multivariate exposure of arbitrary dimension, but focus on bivariate exposures in our simulations and motivating example. The primary objective is unbiased estimation of the dose-response surface of the average outcome given a particular combination of exposure values. We propose methods for estimating weights using a multivariate generalized propensity score, which we call mvGPS, and use weighted regression to estimate the dose-response surface. Our methods rely on the assumption that the exposure variables have a multivariate normal distribution.

In Section 2, we introduce our motivating example, which involves assessing the joint effect of two types of intervention strategies for reducing childhood obesity rates in Los Angeles County. In Section 3, we propose a method of causal inference with multiple simultaneous continuous exposures using the multivariate generalized propensity score. Section 4 presents a simulation study designed to highlight strengths and limitations of the methodology. Section 5 applies the proposed methods to our motivating example. A discussion in Section 6 concludes the paper.

Obesity rates among low-income preschool-aged children in Los Angeles County were consistently higher than the national average in [2003] [2004] [2005] [2006] [2007] [2008] [2009] , with about 20% of such children classified as obese (BMI ≥ 30 kg/m 2 ) [35] . In response, several organizations, including Los Angeles County Department of Public Health, First 5 LA, Nemours, and the Special Supplemental Nutrition Program for Women, Infants and Children (WIC), implemented programs and policies aimed at reducing childhood obesity in the county. The interventions used a wide variety of different approaches and reflected a large investment of resources.

The Early Childhood Obesity Systems Science (ECOSyS) study, funded by National Institute of Health R01 HD072296, sought to evaluate the impact of these programs on childhood obesity prevalence. To this end, ECOSyS collected information on the nature, timing, location and reach of programs implemented in the county in 2003-2016. The research team also developed a method of calculating a "community intervention dose index that aggregates exposure to childhood obesity interventions over multiple different programs [52] . The community intervention dose index is calculated using a multistep procedure. Each program is coded to location and year of implementation, extent of reach into the target population, and which of nine different intervention strategies it used. The nine strategies are listed in Table 1 . The strategies are categorized as "micro strategies, which target specific individuals, or "macro strategies, which target a population at large. By aggregating over the strategies implemented in a particular location during a particular year, strategy-specific as well as total micro and macro intervention dose indices can be calculated.

For this paper, we focus on intervention exposures stemming from WIC programs. WIC serves low-income families and has seven agencies within Los Angeles County with approximately 90 clinics. While WIC offers many regular services, primarily food assistance and nutrition education that are uniform from clinic to clinic, WIC agencies can also receive additional funding to implement intervention programs. These programs are implemented non-randomly at clinics due to differences in community needs and other considerations. Our motivating example focuses on WIC intervention programs implemented in 2010-2016, given that a major change in the WIC food package occurred in 2009 that may have altered family behaviors and neighborhood food environments [15, 47] .

A total of 32 WIC intervention programs implemented in Los Angeles County from 2010-2016 were cataloged by the ECOSyS research team. Information about each program was obtained, including how it was implemented, the estimated reach in terms of client participation, which clinics participated, and how long the program was active. These exposures were mapped to census tracts where WIC participating children live by identifying the implementing WIC clinics and then defining a catchment area around the clinic intended to capture the exposed population. Catchment areas were defined for each clinic using records of client attendance, and varied by strategy type (macro vs. micro). Children living in census tracts that fell within a catchment area were assumed to be exposed. Exposure values at census tracts were then aggregated across programs by strategy and by year to obtain a single continuous dose for each of the nine intervention strategies. Strategy-specific doses were then summed into macro and micro intervention doses, which were log transformed due to skewness. More details about the dose construction procedure are provided in a forthcoming paper. Figure 1 shows the resultant joint distribution of macro and micro intervention dose for the WIC intervention programs averaged over our defined intervention period, 2010-2016. Each point in the figure represents a census tract. WICparticipating children residing in a particular census tract were presumed to receive the calculated doses.

The outcome of interest was change in census tract-level childhood obesity prevalence. Childhood obesity prevalence was measured using administrative records from children ages 2-5 years who participated in WIC in Los Angeles County during 2007-2016, compiled by the WIC Data Mining Project, see https://lawicdata.org for more details. From these records, obesity prevalence by census tract and year was constructed for census tracts with at least 30 WIC-enrolled children. Census tracts used in the analysis were restricted to 8 regions within Los Angeles County that were targeted as part of the ECOSyS data collection effort. This resulted in a total of n = 1079 census tracts which serve as the units of analysis. The outcome of interest, Y , was the difference in average obesity prevalence between post, 2012-2016, and pre, 2007-2009, intervention, i.e., Y =p post −p pre . The post intervention period was taken to start in 2012 rather than 2010 to account for potential lag in treatment effects.

We aimed to estimate the dose-response surface of Y, change in childhood obesity prevalence, associated with combinations of macro and micro intervention exposure doses, after removing bias due to non-random assignment of programs. Understanding the simultaneous effect of macro and micro intervention strategy exposures is important to help policy makers make decisions about the allocation of scarce resources to various intervention strategies [40] .

Our approach to developing the multivariate generalized propensity score follows the Neyman-Rubin causal model and uses the potential outcome notation introduced by Neyman [30] and made popular by Rubin [41] . Let Y i denote the outcome of interest for unit i from a population of size n and D i be a vector of length m providing the values for m continuous exposures for unit i. The confounders relevant to each exposure are allowed to be different. Let C i = {C i1 , . . . , C im } be a set of size m where each element in the set, C ij , j = 1, . . . , m, is a p j dimensional vector of baseline confounders associated with the j th exposure and the outcome. We denote the value of the k th confounder of the j th exposure for the i th individual as C ijk , with i = 1, . . . , n, j = 1, . . . , m, and k = 1, . . . , p j . If all exposures have identical confounders, then C i1 = · · · = C im = C i and p 1 = · · · = p m = p. The observed data for the i th unit is represented as (Y i , D i1 , . . . , D im , C i11 , . . . , C i1p1 , . . . , C im1 , . . . , C impm ). Further, we define the potential outcome Y i (d) as the outcome that the i th subject would have if assigned the exposure vector d = (d 1 , . . . , d m ). We will use capital D to represent the multivariate random variable representing dose combinations, and lowercase d as a particular value in the multidimensional space. Estimation focuses on the average dose-response function defined as

, which is assumed to be well defined for any d ∈ D ⊆ R m . Note that with a bivariate exposure, i.e., m = 2, µ(d) is a dose-response surface in 3-dimensional space.

We make the following identifying assumptions: weak ignorability, positivity, and stable-unit treatment value. Weak ignorability, also known as selection on observables or unconfoundedness, states that exposure is conditionally independent of the potential outcomes given the appropriate set of confounders. We write this in the multivariate case as

When this assumption holds, we can replace the high-dimensional conditioning set with a scalar value by means of the conditional density function of exposure [39] . In our case the conditional density is defined as the multivariate generalized propensity score (mvGPS), which we denote f D|C1,...,Cm . Weak ignorability is often the most difficult assumption to rationalize as it requires perfect knowledge and collection of all possible confounders of the exposures and outcome in the set C. We assume that the set C is well defined and that there is no unmeasured confounding.

The second assumption, positivity, claims that all units have the potential to receive a particular level of exposure given any value of the confounders. In notation, we have

This assumption requires that we carefully define D such that all units have the potential to receive any particular value in the domain. In the case of a univariate continuous exposure, positivity is often enforced by restricting estimation to either the observed range or a trimmed version [7] . For example, using the observed range we would define D = [d 0 , d 1 ] where d 0 and d 1 correspond to the minimum and maximum observed exposure. In the case of a multivariate exposure, a natural inclination might be to extend this approach to multiple dimensions by setting D = G where G is defined as

where d 0j and d 1j are the minimum and maximum observed exposure, respectively, along dimension j. However, when exposure variables are correlated, i.e., Cov(D j , D j ) = 0 for j = j , the region G may include areas with few or no observations. Instead, we propose defining the estimable region for multivariate exposures as D = H ⊂ G, where H is defined as the convex hull of the multivariate exposure [5] . Using a convex hull ensures that inference is restricted to regions where data are observed and avoids extrapolating to sparse data regions in the multidimensional space. For the case of m = 2, Figure 2 shows the difference between regions G and H when Cov(D 1 , D 2 ) = 0.5. Additionally, similar to the univariate case, we can define trimmed versions of G or H. By specifying a value q ∈ [0.5, 1], we construct G q using trimmed minimum and maximum values as

, and Q(·, q) is the sample quantile function. To create the trimmed convex hull, H q , we recalculate the convex hull using the subset of observations that falls within the trimmed minimum and maximum across all exposure dimensions.

The final assumption is the stable-unit treatment value assumption (SUTVA), which states that the potential outcome of each unit does not depend on the exposure that other units receive and that there exists only one version of each exposure [42] . This assumption rules out potential interference between units or other errors in defining the potential outcomes caused by multiple versions of the exposure. Therefore the potential outcomes are well-defined for each unit and the observed outcome given exposure D = d corresponds to the potential outcome, i.e., Y i (d) = Y i . We discuss the tenability of this assumption to our data application in the Discussion.

Using the identifying assumptions above, there are a variety of different methods to estimate the dose-response function including covariate adjustment [16] or stratification [20] . We focus on weighted estimation, originally proposed for binary treatments with marginal structural models [38] and motivated by weights used in survey sampling [17] . We aim to construct a set of weights, w, that when applied to the observed data return a consistent estimate for the average dose-response function, i.e.,

In the case of univariate continuous exposure, weights are constructed by either estimating the generalized propensity score [10, 16, 20, 24, 54] or by direct optimization using an entropy loss function [45, 49] . We choose to extend the generalized propensity score by using an appropriately defined multivariate conditional distribution, which we refer to as the multivariate generalized propensity score (mvGPS). The weights are thus constructed as the ratio of the multivariate marginal density to the conditional density

where the numerator is the marginal density of the multivariate exposure and the denominator is the mvGPS. These weights are referred to in the literature as stabilized inverse probability of treatment weights (IPTW) [38] . To motivate the intuition behind constructing weights in this manner, we can note that w = 1 when the probability of exposure is independent of the confounding set C, i.e., f (D | C 1 , . . . , C m ) = f (D), which would hold in the case of a randomized experiment. For tractability, we propose using multivariate normal models for both densities, i.e.,

where each β T j is a row vector of length p j corresponding to the effect of the set of confounders C j on D j . By factorizing both the numerator and denominator in Equation 2, we can compute w using full conditionals, i.e.,

where each conditional expression is univariate normal. The second line is a result of the fact that the j th exposure is independent of the confounders of other exposures given C j , i.e.,

where C −j represents the set of confounders excluding C j , i.e.,

Evaluating only the conditional densities reduces computational burden by eliminating the need to directly estimate the covariance matrices, Σ and Ω.

Let θ be the collection of mean and variance parameters from all of the univariate normal densities in Equation 3 . Estimation of the parameters to obtainθ proceeds by maximizing the corresponding conditional density via least squares. The weight for the i th subject, w i , is obtained by evaluating the densities usingθ with the values of the observed exposures, D i1 , . . . , D im , and confounders, C i1 , . . . , C im .

When the weights are properly specified, the covariance between each exposure D j and confounder C jk for j = 1, . . . , m and k = 1, . . . , p j is zero:

This balancing property of the weights serves as an important diagnostic when using the mvGPS as part of a causal analysis [2] . Weights that do not reduce the exposure-confounder correlation suggest that the distributional assumptions are invalid, the propensity equations are misspecified, or that there are insufficient data as the balance is achieved asymptotically.

Further, it follows that these weights are already normalized, i.e, E[w] = 1, and they maintain the marginal moments of D and

where the expectations are taken with respect to the joint density f (D, C 1 , . . . , C m ).

It remains to show that the weights as constructed satisfy Equation 1. To do this we follow the logic proposed by Robins on using IPTW to correct for confounding [37] . We first note that the joint density of the potential outcome can be factorized as

where the second line follows from the assumption of weak ignorability. We can then let f (D) be a density for our multivariate exposure and construct a new joint density f * where we replace f (D | C 1 , . . . , C m ) with f (D) as would be the case if the exposures were independent of the confounders. This new density is written as

where the marginal mean of the potential outcomes is equivalent under either joint density f or f * , i.e.,

Using this new density we can write our dose response as

using the SUTVA assumption. The resulting expression, E * [Y | D = d], is equivalent to the mean expression in a linear regression of the observed exposures on outcome. Finally, we have

which gives us the result from Equation 1 that our weighted regression does indeed provide a consistent estimate of the dose-response function.

We conducted a simulation study to demonstrate the performance of the mvGPS method under different scenarios of confounding and compare it to three commonly used univariate methods. The univariate methods were entropy balancing [45] , the covariate balanced generalized propensity score (CBGPS) [10] , and the generalized linear propensity score (PS). The entropy balancing method uses non-parametric constrained optimization with an entropy loss function to solve for weights without specifying a propensity score model. CBGPS attempts to achieve propensity specification and covariate balance simultaneously by introducing a penalty term into the likelihood. The PS method uses univariate normal densities for the marginal distribution of exposure and the generalized propensity score without balance constraints. Although these univariate methods can handle only single exposure variables, we expected that they might perform adequately when the multiple exposure variables are highly correlated and have the same confounders. However, when exposure variables have separate sets of confounders and/or are only weakly correlated, we expected that the mvGPS method would outperform the univariate methods.

In our simulations we focus exclusively on a bivariate exposure, m = 2, similar to that found in our motivating example. For each simulated data scenario, each univariate method was applied twice, once to each exposure variable, with each such application yielding a set of weights that were used to assess balance on confounders and estimate the dose-response function.

The first step of the simulation is to draw the vector of covariates X for each unit. We assume that there are a total of 10 covariates collected prior to exposure and that the covariates follow a normal distribution,

where the covariance matrix Σ X is compound symmetric with variance 1 and covariance 0.2, to create a set of correlated covariates.

Realizations of the conditional distribution of the bivariate continuous exposure levels, D = (D 1 , D 2 ) T given X, were then generated as bivariate normal,

is a 2 × 10 matrix with row vectors β T 1 and β T 2 representing the effects of X on D 1 and D 2 , respectively, and Σ D|X is the 2 × 2 conditional covariance matrix. For all simulations the conditional standard deviation for each exposure was set to 2, while values of the conditional correlation ρ D|X were allowed to vary over {0, 0.1, 0.3, 0.5, 0.7, 0.9}.

Note that the marginal covariance matrix of the exposures, Σ D , is equal to Σ D = Σ D|X + βΣ X β T . This means that the marginal correlation of the two exposure variables, ρ D , depends on their conditional correlation ρ D|X , the covariance of X and the degree of overlap of covariates. The degree of overlap is reflected in the number of non-zero elements that are common between β T 1 and β T 2 . As the degree of overlap increases, the marginal correlation also increases. Since Σ X is compound symmetric with constant covariance of 0.2, the marginal correlation is guaranteed to be non-zero even with zero overlap and zero conditional correlation.

Finally, the outcome Y was sampled from a univariate normal distribution conditional on D and X as

where α T = α T X α T D is a 1 × 12 vector of coefficients, which we separate as α T X , a 1 × 10 vector representing the effect of covariates on the outcome, and α T D , a 1 × 2 vector corresponding to the treatment effects. In all simulations the conditional standard deviation of the outcome was equal to 4, i.e., σ Y = 4.

Three scenarios were constructed to reflect different degrees of overlap of confounding for the two exposures: M1: No Common Confounding, M2: Partially Common Confounding, and M3: Common Confounding. Directed acyclic graphs (DAGs) for each scenario are shown in Figure 3 . Tables 2, 3, and 4 display the coefficients in the vectors β T 1 , β T 2 and α T for each scenario. In M1, the two exposures D 1 and D 2 each have five covariates, with none in common; for each exposure, two of the covariates are true confounders (i.e., also associated with Y ). The outcome Y is a function of the two exposures as well as the four true confounders, none of which are shared between D 1 and D 2 . In M2, D 1 and D 2 again have five covariates each, but they share three in common. Two of the shared covariates are true confounders, and each exposure has a confounder that is not shared with the other exposure. The outcome Y is again a function of the two exposures and the four true confounders, two of which are shared and two of which are not. In M3, D 1 and D 2 share the same five covariates. Four of these are true confounders, and Y is a function of the two exposures and four common confounders. In all scenarios, the treatment effect for each of the exposures was set to 1, i.e., α T D = (1, 1). The three simulation scenarios were run with a sample size of n = 200 for a total of B = 1000 Monte Carlo repetitions using R Version 4.0 [36] . For each repetition, weights were estimated using mvGPS and the three univariate methods with the proper set of confounders specified for each exposure. For example, for Scenario M1, weights were constructed for D 1 using X 2 and X 4 while D 2 depended on X 6 and X 9 (see Table 2 ). The three univariate methods, entropy balancing, CBGPS and PS, were implemented using the WeightIt package in R [14] . The parametric version of CBGPS was used. The mvGPS method was implemented using the mvGPS package in R. All methods were compared against an unweighted approach equivalent to applying a weight of 1 to all observations. Weighted Pearson correlations between the exposures and confounders were used to assess balancing performance [2, 54] . We examined maximum absolute correlation, which reflects the most imbalanced confounder after weighting and has been shown to be a key metric to assess balance [8] , and the average absolute correlation, which summarizes how well balance is achieved over all confounders. These correlation values were taken over both sets of exposures.

Effective sample sizes, (Σ i w i ) 2 /Σ i w 2 i , were calculated to summarize the relative power of each method [26] . The weights were then used to estimate the dose-response model. The performance metrics were absolute total bias, Σ j |α Dj −α Dj | for j = 1, 2, and root mean squared error, 1 n Σ i (y * i −ŷ i ) 2 , where i = 1, . . . , 500 samples, y * , were drawn from a uniform grid on the convex hull, H, over the observed joint distribution of the two exposures. Each of the metrics was averaged over the 1000 repetitions.

All methods reduce the effective sample size when weights are applied to the sample. Of particular concern for practitioners are extreme weights. When sample sizes are small or moderate, extreme weights can have an outsized influence. They may also result in limited power to detect treatment effects and erratic estimation [23] . One remedy is to trim extreme weights [18, 28] . As all simulations were run with moderate sample sizes, i.e., n = 200, we wanted to test performance when weight trimming was applied as might be done in practice by analysts when faced with extreme weights. Our simulation analyses were thus repeated using trimmed weights for each method, w q , where q ∈ {0.99, 0.95}. Weights were trimmed at both the upper and lower bounds of the respective sample percentile such that values above or below the thresholds were replaced with the threshold value. Figure 4 plots the absolute maximum correlation between the exposures and confounders for each method along with the original unweighted correlations for comparison. In general, with no common confounding or partially common confounding, the mvGPS method substantially outperformed each univariate method. However, for common founding, mvGPS performs best only when the marginal correlation is low. The performance of univariate methods tended to cluster differently based on the degree of confounding overlap. In models with low overlap, performance was clustered based on exposure, D 1 or D 2 , but as the degree of overlap increased, performance became clustered by type of estimation, Entropy, CBGPS, or PS. Applying trimmed weights, we see a slight improvement for q = 0.99 while q = 0.95 has little to no effect. Figure 5 shows the average absolute exposure-covariate correlation along with a reference line at 0.1, a benchmark suggesting sufficient covariate balance [54] . For all simulation models, the mvGPS is consistently near the 0.1 threshold. For the univariate methods, we see trends in performance similar to those observed for the maximum correlation, but differences between methods are smaller. Entropy methods consistently had the lowest average correlation, especially with high overlap or high marginal correlation. Trimming the weights tended to eliminate the effect of the marginal correlation on the mvGPS, resulting in flatter lines for q = 0.99 and q = 0.95, particularly for models with at least some common confounding. Figure 6 displays trends in effective sample size for each method across the various simulation scenarios, with a reference line at 100, which is often a minimum desirable quantity for inference in the dose-response model [49] . The mvGPS method tends to have lower effective sample sizes compared to the univariate methods. Importantly, using untrimmed weights, the mvGPS method has effective sample sizes less than 100 in the presence of partial or common confounding, indicating particularly low power. Trimming the weights increases the effective sample size for all methods and the difference between methods decreases as q increases. Figure 7 shows the results of each method with respect to total absolute bias for the treatment effects estimated from weighted regression. Generally, the mvGPS method has the lowest total bias, with the exception of high correlation in the model with partially common confounding or no common confounding. Of particular note, although the effective sample size and balancing diagnostics were lower for mvGPS in the common confounding model, it significantly outperforms all univariate methods with respect to bias even with high marginal correlation. We also observe that certain univariate methods have greater bias than the unweighted estimates, such as those that estimate weights using D 2 for the common confounding model. Trimming the weights tended to reduce bias for mvGPS when there was high correlation of the exposures, particularly under models with either partially overlapping confounding or no common confounding, while slightly increasing the bias for the common confounding models. Figure 8 shows the root mean squared error based on 500 points sampled along a grid from the convex hull, H q , of the exposures. The precision of predicted values for the mvGPS is often worse than that for univariate methods. As ρ increases, the mvGPS method has worse performance, with decreased power from low effective sample sizes. Trimming the weights helps reduce this trend and reduces the root mean squared error across all methods.

In summary, using a multivariate method for weight estimation is critical to achieve balance as univariate methods in general do not effectively balance on the confounders for both exposures. The multivariate method protects against any single confounder being strongly imbalanced across either exposure at the expense of slightly lower average balance, while the univariate methods have potentially large imbalance on the unused exposure dimension. The notable exception is when there is high overlap in terms of confounders. In this case univariate methods can sufficiently balance confounders, particularly when the marginal correlation of exposures is high. However, despite achieving balance in these scenarios, the univariate methods still resulted in high total bias of the treatment effect estimates. Although mvGPS weights were advantageous with respect to balance and bias, they tend to produce smaller effective sample sizes, resulting in lower power and higher root mean squared error. Weight trimming, particularly with q = 0.99, offers a potential remedy to reduce these effects while also maintaining balance and low bias.

The WIC program is designed to provide nutrition education, 'vouchers' for selected healthy food, and referrals. The rapid increase in childhood obesity rates in the early 2000s led to interventions to increase physical activity, and improve access to healthy food especially in communities where affordable fresh produce is not available. In this motivating example, we estimated the causal effects of interventions that were implemented by individual WIC clinics in attempts to meet the specific needs of the communities they served. As discussed in Section 2, the intervention programs were classified as using macro or micro strategies and we calculated two continuous dose measures for n = 1079 census tracts. The outcome was the difference in average obesity prevalence from post, 2012-2016, to pre, 2007-2009, intervention period, calculated as Y =p post −p pre at the census tract level. Negative values of Y indicate that the prevalence of obesity decreased. We hypothesized that areas with more macro and micro strategies would have the greatest reduction in rates of obesity.

Data on potential census tract-level confounders came from three sources: US Census American Community Survey (ACS) 5-year estimates [46] , WIC administrative data, and the National Establishment Time-Series (NETS) [51] . Variables from the ACS captured community level demographic characteristics such as median household income, education level, primary language spoken at home, and ethnicity, which have been shown in previous research to be associated with obesity rates [31, 32] . WIC administrative data were used to calculate average pre-treatment overweight and obesity prevalence for each census tract. Overweight and obesity prevalence were considered potential confounders because agencies may have directed interventions towards clinics in higher prevalence regions. Finally, NETS provided information on neighborhood food environments, specifically on the density per square mile of unhealthy and healthy food outlets [1, 53] . Previous analysis has shown that higher density of healthy outlets was associated with lower obesity prevalence among low-income preschool-aged children in Los Angeles County [4] . Both macro and micro propensity dose equations included the same set of potential confounders from these three data sources, but each exposure was assessed separately to determine if higher order polynomial terms for any confounders were needed. These analyses showed that both macro and micro dose had quadratic relationships with education level and density of food outlets, while only macro dose had evidence of a quadratic relationship with median household income.

After defining the appropriate functional form for each exposure and confounder, weights were then estimated using the mvGPS method and the three univariate methods discussed in Section 4.1. To maintain the assumption of positivity, data used for estimating the weights were restricted to the trimmed convex hull H 0.95 shown in Figure 1 , where a bivariate normal distribution is plausible. The marginal correlation of exposures was moderate, r = 0.28, in this highdensity region. As both exposures had nearly identical sets of potential confounders, the data generating mechanism was akin to the common confounding scenario described in the simulations. Therefore, to protect against extreme weights and reduce the variability of the resulting dose-response estimates, weights for each method were trimmed using q = 0.99. Table 5 shows the balancing diagnostics, maximum absolute correlation and average absolute correlation, and the effective sample sizes. The confounders were significantly imbalanced prior to weighting with the average absolute correlation above 0.2 and the maximum absolute correlation above 0.4. All methods were able to improve balance, but the mvGPS method had substantially greater reduction in imbalance than the univariate methods. The average absolute correlation and maximum absolute correlation were reduced to 0.04 and 0.10 respectively after applying mvGPS weights. The effective sample size for mvGPS was reduced from the original sample of n = 1079 to 604. However, since the population included over 1000 census tract units, the power was still reasonably high.

Finally, we applied weights from the mvGPS method to estimate the joint effect of macro and micro exposure doses on change in obesity prevalence using weighted least squares regression and compare these to unweighted estimates. Only exposures within the trimmed convex hull, H 0.95 , were used to estimate treatment effects and predict the dose-response surface. The dose-response model for both methods included linear terms for each exposure and an interaction between the two exposures. Figure 9 shows the weighted and unweighted dose-response surfaces along with a reference plane of no change in obesity prevalence. Both the weighted and unweighted surfaces suggest reductions in obesity prevalence from our pre-intervention period, 2007-2009, to the post-intervention period, 2012-2016, for all dose combinations. This is consistent with studies showing a decrease in obesity risk among WIC-participating children in Los Angeles County associated with the 2009 change in the WIC food packages [3, 33] . The unweighted dose-response surface is a monotonic plane; increases in micro dose and in macro dose are each associated with greater reduction in obesity prevalence, the associations are additive, and the greatest reduction in prevalence corresponds to the highest levels of macro and micro doses. The mvGPS dose-response surface is more complex and shows an interaction effect. At low levels of macro dose, increases in micro dose are associated with a steep reduction in obesity prevalence. However, as macro dose increases, high micro dose becomes gradually less effective. In the quadrant where both macro and micro dose are high, higher micro doses appear to be less beneficial rather than more beneficial. There are several possible explanations for this result. There could be important confounders that were not accounted for in the analysis. There could also have been measurement error in estimating exposures. We noted that the data set included several observations with high macro and high micro doses that were assigned high weights and had either no decrease or a slight increase in child obesity prevalence. Further investigation of these census tracts may yield more information and guide model refinements.

In this manuscript, we introduced methodology for generating a multivariate generalized propensity score, mvGPS, to be used in estimating the causal effect of multiple simultaneous continuous exposures in observational or nonrandomized studies. We have developed the R package mvGPS available at https://github.com/williazo/mvGPS to implement the methods.

Through simulations we have shown that, when estimating the causal effects of two simultaneous exposures, mvGPS weights are effective at reducing both the maximum and average absolute correlation between exposures and confounders. Further, the weights can minimize bias in estimating the dose-response function in realistic data generating settings with moderate sample sizes. The simulations identified two key factors that affect performance. When the exposures have highly overlapping sets of confounders or large marginal correlation, the mvGPS method may generate extreme weights, resulting in smaller effective sample sizes and higher average root mean squared error. Trimming the weights improves performance in these situations. We suggest that in settings with a high degree of overlap in the confounders or moderate to large marginal exposure correlation, weights should be trimmed at q = 0.99.

While our method could in principle be extended to an arbitrary number of continuous exposures, we have confined attention in our simulations and application to the setting of two exposures. Assessing the joint effect of two interventions is a common scientific question, and we expect that there are many practical applications of the methods. The resulting dose-response surface for bivariate exposures can be easily visualized and interpreted, which is key in practice. Further work is needed to explore performance for settings with more than two exposures. In particular, positivity and achieving adequate balance may be increasingly difficult as dimensions of exposure increase. For higher order exposures, dimension reduction techniques such as principal component analysis or manifold learning might be applied to transform the problem to a lower order continuous exposure space.

We applied the mvGPS method to evaluate the joint effectiveness of macro and micro intervention strategies used in childhood obesity programs on change in obesity prevalence among low-income preschool aged children. Due to nonrandom selection of participating clinics, there was significant imbalance on potential confounders as evidenced by large absolute maximum and average correlations prior to weighting. The mvGPS method achieved superior balance compared to univariate alternatives, reducing the maximum absolute correlation to the benchmark of 0.1 and the average absolute correlation to nearly zero. Estimates of the dose-response surface using weights from the mvGPS method differed substantially from the results of the unweighted surface. The results showed that the most effective intervention combination was higher levels of micro strategies and lower levels of macro strategies. However, our results should be interpreted with caution. As with other causal inference methods, all confounders must be adequately captured and modeled to produce unbiased estimates. Thus our estimated treatment effects may be biased due to unknown confounders. In particular, communities that received higher levels of macro and micro doses may have been inherently more difficult to change due to a complex interplay of community and personal factors not captured by our set of potential confounders. In addition, we have assumed that the potential outcomes of the change in obesity given macro and micro exposures are well-defined via SUTVA as discussed in Section 3.2. Specifically, SUTVA stipulates that multiple versions of the exposures do not exist. In our application, however, there are potentially different sets of interventions that can yield the same macro and micro exposure scores. In the presence of multiple versions of the exposures, the resulting potential outcomes may be unidentifiable. Further, we had to estimate exposure to the interventions at the census tract level using various assumptions, which may have resulted in measurement error. Thus our application, while demonstrating the methods, has important limitations.

Our approach assumes that the exposures have a multivariate normal distribution. The multivariate normal distribution is particularly attractive for working in higher dimensions. In our case, it allows for the full conditionals used to generate weights in Equation 3 to be tractable univariate normal densities. Further, the asymptotics of the estimates are well behaved due to the central limit theorem. We note that it is common practice when using the generalized propensity score for continuous treatments in the univariate case to assume normality [10, 16, 20, 37, 54] . However, this reliance on multivariate normality is an important limitation of the methodology, particularly in assessing its validity [29] . Possible extensions include replacing the multivariate normal distribution with non-parametric or semi-parametric alternatives as has been done recently with univariate methods [24, 45, 49] . Other possible extensions include allowing for time-varying outcomes and exposures such has been recently proposed for CBGPS [19] . Additionally, SUTVA might be relaxed to test for potential geographic interference [48, 50] . 

All values of the covariates enter in each equation linearly with the respective coefficients shown in the table. In this scenario, covariates X 1 − X 5 are associated with exposure D 1 ; however, only X 2 and X 4 are true confounders also associated with Y . Similarly, covariates X 6 − X 10 are associated with exposure D 2 ; however, only X 6 and X 9 are true confounders also associated with Y . There are no common confounders in this model. Each exposure has a true treatment effect coefficient of 1. 

All values of the covariates enter in each equation linearly with the respective coefficients shown in the table. In this scenario, covariates X 3 − X 7 are associated with exposure D 1 ; however, only X 3 , X 6 , and X 7 are true confounders also associated with Y . Similarly, covariates X 5 − X 9 are associated with exposure D 2 ; however, only X 6 X 7 , and X 9 are true confounders also associated with Y . Common confounders of exposure D 1 and D 2 are X 6 and X 7 . Confounder of exposure D 1 only is X 3 , and X 9 is a confounder of D 2 only. Each exposure has a true treatment effect coefficient of 1. 

All values of the covariates enter in each equation linearly with the respective coefficients shown in the table. In this scenario, covariates X 1 − X 5 are associated with exposure D 1 and D 2 ; however, only X 1 , X 3 , X 4 and X 5 are true confounders also associated with Y . Common confounders of exposure D 1 and D 2 are X 1 , X 3 , X 4 , and X 5 . There are no confounders of D 1 or D 2 only in this model. Each exposure has a true treatment effect coefficient of 1. Assessing balance and effective sample size (ESS) using various weighted methods for the motivating example. Data used for estimating weights were restricted to H 0.95 . The univariate methods were applied separately to the two exposure metrics, macro or micro dose. The correlations are taken over both exposure metrics. The unweighted method represents the original values before applying weights. The weights for each method are trimmed using q = 0.99. Directed acyclic graphs for each of the three data generating simulation scenarios. D 1 and D 2 are continuous exposure measures and Y is the outcome of interest. C 1 and C 2 represent confounder sets that are specific to exposures D 1 and D 2 , while C 12 represents a confounder set common to both exposures. Rows correspond to the three simulation scenarios, M1, M2 and M3, and each column corresponds to quantiles used for weight trimming. The y-axis is the average effective sample size, (Σ i w i ) 2 /Σ i w 2 i , for n = 200 from B = 1000 repetitions. The x-axis, ρ, is the marginal correlation of the exposures. The red line corresponds to an effective sample size of 100 which is often a minimum desirable quantity for hypothesis testing and inference of the dose-response model. Rows correspond to the three simulation scenarios, M1, M2 and M3, and each column corresponds to quantiles used for weight trimming. The y-axis is the average total absolute bias, Σ j |α Dj −α Dj |, for n = 200 from B = 1000 repetitions. The x-axis, ρ, is the marginal correlation of the exposures. For univariate methods, weights were generated twice, once for each exposure variable. Rows correspond to the three simulation scenarios, M1, M2 and M3, and each column corresponds to quantiles used for weight trimming. The y-axis is the average root mean squared error (RMSE) for 500 points sampled on a convex hull H q grid for n = 200 from B = 1000 repetitions. The x-axis, ρ, is the marginal correlation of the exposures. For univariate methods, weights were generated twice, once for each exposure variable. Estimated dose-response surface of change in obesity prevalence as a function of log macro and micro dose, obtained using mvGPS weights and unweighted. The surface is restricted to the convex hull of observed bivariate exposure, H 0.95 , shown in Figure 1 and points are sampled evenly along this grid. A reference plane of no change is included. For a 3D interactive version of the dose-response surface that includes lower and upper bound 95% confidence interval surfaces for the mvGPS method, visit https://williazo.github.io/resources/.

The neighborhood food environment modifies the effect of the 2009 WIC food package change on

Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures

The 2009 Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) food package change and children's growth trajectories and obesity in Los Angeles County

Influences of the neighbourhood food environment on adiposity of low-income preschool-aged children in Los Angeles County: A longitudinal study

An optimal convex hull algorithm in any fixed dimension

Infliximab, azathioprine, or combination therapy for Crohn's disease

Dealing with limited overlap in estimation of average treatment effects

Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies

Estimating the effects of length of exposure to instruction in a training program: The case of job corps

Covariate balancing propensity score for a continuous treatment: application to the efficacy of political advertisements

Hydroxychloroquine and azithromycin as a treatment of COVID-19: Results of an openlabel non-randomized clinical trial

Clinical and microbiological effect of a combination of hydroxychloroquine and azithromycin in 80 COVID-19 patients with at least a six-day follow up: A pilot observational study

Combination therapy in hypertension

WeightIt: Weighting for Covariate Balance in Observational Studies

The impact of WIC food package changes on access to healthful food in 2 low-income urban neighborhoods

Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives

A generalization of sampling without replacement from a finite universe

The performance of estimators based on the propensity score

Covariate balancing inverse probability weights for time-varying continuous interventions

Causal inference with general treatment regimes: Generalizing the propensity score

The role of the propensity score in estimating dose-response functions

Normalizing tumor vasculature with anti-angiogenic therapy: A new paradigm for combination therapy

Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data

Non-parametric methods for doubly robust estimation of continuous treatment effects

Response surface methodology

Survey Sampling

Evaluation of the effect of a continuous treatment: A machine learning approach with an application to treatment for traumatic brain injury

Weight trimming and propensity score weighting

An appraisal and bibliography of tests for multivariate normality

On the application of probability theory to agricultural experiments. Essay on principles

Immigrant enclaves and obesity in preschool-aged children in Los Angeles County

Widening socio-economic disparities in early childhood obesity in Los Angeles County after the great recession

Trends in socioeconomic disparities in obesity prevalence among low-income children aged 2-4 years in Los Angeles County

Decay characteristics of HIV-1-infected compartments during combination therapy

WIC data 2003-2009: A report on low income families with young children in Los Angeles County

R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing

Marginal structural models versus structural nested models as tools for causal inference

Marginal structural models and causal inference in epidemiology

The central role of the propensity score in observational studies for causal effects

Model of the home food environment pertaining to childhood obesity

Estimating causal effects of treatments in randomized and nonrandomized studies

Randomization analysis of experimental data: The Fisher randomization test comment

Pharmacologic treatments for coronavirus disease 2019 (COVID-19): A review

COVID-19: Combining antiviral and anti-inflammatory treatments

Entropy balancing for continuous treatments

American community survey

Final rule: Revisions in the WIC food packages

Ignorability and stability assumptions in neighborhood effects research

Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures

Causal inference under interference in spatial settings: A case study evaluating community policing program in chicago

National Establishment Time-Series (NETS) database

Developing an index of dose of exposure to early childhood obesity community interventions

The neighborhood food environment: Sources of historical data on retail food stores

A boosting algorithm for estimating generalized propensity scores with continuous treatments

This work was supported in part by funding from the National Institute of Health R01 HD072296. We would like to thank May Wang and Michael Prelip, co-principal investigators of the study, for their feedback and help throughout the development of this project. Special thanks to Shannon Whaley and Christopher Anderson from PHFE WIC in providing access to the data from WIC and helping to generate catchment areas. Finally, thanks to the members of the ECOSyS team that have provided assistance on this project: Shelley Jung, Linghui Jiang, Roch Nianogo, Tabashir Nobari, Kara MacLeod, and Lilly Nhan. The opinions about the data used in the manuscript are solely those of the authors and do not reflect official positions of either government or individual agencies.

Each intervention program was classified based on the strategies that were implemented. Programs could use multiple strategies, e.g., Group Education and Counseling. Strategies are categorized as either micro and macro based on whether they directly targeted individuals or the population at large. The final column represents how many of the 32 programs used that particular strategy.