key: cord-0646425-7xg18i51 authors: Miao, Wang; Hu, Wenjie; Ogburn, Elizabeth L.; Zhou, Xiaohua title: Identifying effects of multiple treatments in the presence of unmeasured confounding date: 2020-11-09 journal: nan DOI: nan sha: 8b91115160c643791c39142c272abf9ea1f2f210 doc_id: 646425 cord_uid: 7xg18i51 Identification of treatment effects in the presence of unmeasured confounding is a persistent problem in the social, biological, and medical sciences. The problem of unmeasured confounding in settings with multiple treatments is most common in statistical genetics and bioinformatics settings, where researchers have developed many successful statistical strategies without engaging deeply with the causal aspects of the problem. Recently there have been a number of attempts to bridge the gap between these statistical approaches and causal inference, but these attempts have either been shown to be flawed or have relied on fully parametric assumptions. In this paper, we propose two strategies for identifying and estimating causal effects of multiple treatments in the presence of unmeasured confounding. The auxiliary variables approach leverages variables that are not causally associated with the outcome; in the case of a univariate confounder, our method only requires one auxiliary variable, unlike existing instrumental variable methods that would require as many instruments as there are treatments. An alternative null treatments approach relies on the assumption that at least half of the confounded treatments have no causal effect on the outcome, but does not require a priori knowledge of which treatments are null. Our identification strategies do not impose parametric assumptions on the outcome model and do not rest on estimation of the confounder. This paper extends and generalizes existing work on unmeasured confounding with a single treatment and models commonly used in bioinformatics. Identification of treatment effects in the presence of unmeasured confounding is a persistent problem in the social, biological, and medical sciences, where in many settings it is difficult to collect data on all possible treatment-outcome confounders. Identification means that the treatment effect of interest is uniquely determined from the joint distribution of observed variables. Without identification, statistical inference may be misleading and is of limited interest. Most of the work on unmeasured confounding by causal inference researchers focuses on settings with a single treatment, and either harnesses auxiliary variables (e.g. instruments, negative controls, or confounder proxies) to achieve point identification of causal effects, or relies on sensitivity analyses or on weak assumptions to derive bounds for the effects of interest. A large body of work from statistical genetics and computational biology is concerned with multiple treatments-for example GWAS (Genome-Wide Association Studies) with confounding by population structure and computational biology applications with confounding by batch effects. Recently, there have been a few attempts to put these approaches on solid theoretical footing and to bridge the gap between these statistical approaches and causal inference. However, these attempts either rely themselves on strong parametric models that circumvent the underlying causal structure, or have been shown to be flawed. In this paper, we propose two novel strategies for identifying causal effects of multiple treatments in the presence of unmeasured confounding without placing any parametric restrictions on the outcome model. This paper generalizes existing work on unmeasured confounding with a single treatment to the multi-treatment setting, and resolves challenges that have undermined previous proposals for dealing with multi-treatment unmeasured confounding. For a single treatment, a variety of methods have been developed to test, adjust for, and eliminate unmeasured confounding bias. Sensitivity analysis (Cornfield et al., 1959; Rosenbaum and Rubin, 1983a; Ding and Vanderweele, 2014; Ding and VanderWeele, 2016) and bounding (Manski, 1990; Balke and Pearl, 1997; Richardson and Robins, 2014; Cai et al., 2008; Sjölander, 2009 ) are used to evaluate the robustness of causal inference to unmeasured confounding. For point identification of the treatment effect, the instrumental variable (IV) is an influential tool used in biomedical, epidemiological, and socioeconomic studies (Wright, 1928; Goldberger, 1972; Robins, 1994; Baker and Lindeman, 1994; Angrist et al., 1996; Didelez and Sheehan, 2007; Small et al., 2017) . Recently, , Shi et al. (2020) , Tchetgen Tchetgen et al. (2020), Lipsitch et al. (2010) , Kuroki and Pearl (2014) , VanderWeele (2012), de Luna et al. (2017) , Flanders et al. (2017) , and Wang et al. (2017) demonstrate the potential of using confounder proxy variables and negative controls for adjustment of confounding bias. For an overview of recent work in the single treatment setting, see Tchetgen Tchetgen et al. (2020) and Wang and Tchetgen Tchetgen (2018) . Similar methods can sometimes be used in settings with multiple treatments, simply treating them as a single vector-valued treatment. These approaches allow for unrestricted correlations among the treatments. However, if, as is typically the case in GWAS and computational biology settings, correlations among treatments contains useful information about the confounding, these methods cannot leverage the information. Latent variable methods leveraging the multi-treatment correlation structure have been used to estimate and control for unmeasured confounders in biological applications since the early 2000s (Alter et al., 2000; Price et al., 2006; Storey, 2007, 2008; Sun et al., 2012; Friguet et al., 2009; Gagnon-Bartsch and Speed, 2012; Luo and Wei, 2019) . Recently, a few authors have attempted to elucidate the causal structure underlying these statistical procedures and to establish rigorous theoretical guarantees for identification, using fully parametric models. Wang et al. (2017) propose confounding adjustment approaches for the effects of a treatment on multiple outcomes under a linear factor model; by reversing the labeling of the outcome and treatments, their approaches can test but not identify the effects of multiple treatments on the outcome. Kong et al. (2021) consider a binary outcome with a univariate confounder and prove identification under a linear factor model for multiple treatments and a parametric outcome model via a meticulous analysis of the link distribution; but their approach cannot generalize to the multivariate confounder setting as we illustrate with a counterexample in the supplement. Grimmer et al. (2020) ; Ćevid et al. (2020) ; Guo et al. (2020) , and Chernozhukov et al. (2017) consider linear outcome models with high-dimensional treatments that are confounded or mismeasured; in this case, identification is implied by the fact that confounding on each treatment vanishes as the number of treatments goes to infinity. In contrast, we take a fundamentally causal approach to confounding and to identification of treatment effects by allowing the outcome model to be unrestricted, the treatment-confounder distribution to lie in a more general, though not unrestricted, class of models, the number of treatments to be finite, and confounding to not vanish. Most notably, Wang and Blei (2019) provide an intuitive justification for using latent variable methods in general multi-treatment unmeasured confounding settings; they call the justification and resulting method the "deconfounder." Their approach uses a factor model assuming that treatments are independent condi-tional on the confounder to estimate the confounder, and the confounder estimate is used for adjustment of bias. However, as demonstrated in a counterexample by D'Amour (2019a) and discussed by Ogburn et al. (2019 Ogburn et al. ( , 2020 and Imai and Jiang (2019) , identification is not guaranteed for the deconfounder, i.e., the treatment effects can not be uniquely determined from the observed data even with an infinite number of data samples. Additionally, an infinite number of treatments are required for consistent estimation of the confounder, complicating finite sample inference and undermining positivity. For refinements and discussions of the deconfounder approach, see Wang and Blei (2020) , D'Amour (2019a), Ogburn et al. (2020) , Grimmer et al. (2020) , and the commentaries (D'Amour, 2019b; Ogburn et al., 2019; Imai and Jiang, 2019; Athey et al., 2019) published alongside Wang and Blei (2019) . D'Amour (2019a) suggests the proximal inference and Imai and Jiang (2019) consider the conventional instrumental variable approach to facilitate identification. However, if correlations among the multiple treatments are indicative of confounding, as the deconfounder approach assumes, neither of these two methods makes use of that correlation. Moreover, their extension to the multi-treatment setting is complicated by the fact that the proximal inference requires confounder proxies to be causally uncorrelated with any of the treatments and the instrumental variable approach requires at least as many instrumental variables as there are treatments. In Section 2, we review the challenges for identifying multi-treatment effects in the presence of unmeasured confounding. In Sections 3 and 4, we propose two novel approaches for the identification of causal effects of multiple treatments with unmeasured confounding: an auxiliary variables approach and a null treatments approach. Both approaches rely on two assumptions restricting the joint distribution of the unmeasured confounder and treatments. The first assumption is that the joint treatment-confounder distribution lies in a class of models that satisfy a particular equivalence property that is known to hold for many commonly-used models, e.g., many types of factor and mixture models. This assumption can accommodate other treatmentconfounder models, such as mixture models, in addition to the factor models considered by Wang and Blei (2019) , Kong et al. (2021) , Wang et al. (2017) , and Grimmer et al. (2020) . The second assumption is that the treatment-confounder distribution satisfies a completeness condition that is standard in nonparametric identification problems. In addition to these two assumptions, the auxiliary variables approach leverages an auxiliary variable that does not directly affect the outcome to identify treatment effects, such as an IV or confounder proxy. In the presence of a univariate confounder, identification can be achieved with our approach even if only one auxiliary variable is available and if it is associated with only one confounded treatment. In contrast, IV approaches require as many instrumental variables as there are treatments and that all confounded treatments must be associated with the instrumental variables. The null treatments approach does not require any auxiliary variables, but instead rests on the assumption that at least half of the confounded treatments are null, without requiring knowledge of which are active and which are null. In these two approaches, identification is achieved without imposing parametric assumptions on the outcome model, although the joint treatment-confounder distribution is restricted by the equivalence and the completeness assumptions. Identification does not rest on estimation of the unmeasured confounder, and thus works with a finite number of treatments and does not run afoul of positivity. In the absence of auxiliary variables and if the null treatments assumption fails to hold, our method still constitutes a valid test of the null hypothesis of no joint treatment effect. Because identification in both approaches requires solving an integral equation, an explicit identification formula is not available for unrestricted outcome models. However, we describe some estimation strategies in Section 5. In simulations in Section 6, the proposed approaches perform well with little bias and appropriate coverage rates. In a data example about mouse obesity in Section 7, we apply the approaches to detect genes possibly causing mouse obesity, which reinforces previous findings by taking unmeasured confounding into account. Section 8 concludes with a brief mention of some potential extensions of our approaches. Proofs and further discussions are relegated to the supplement. Throughout the paper, we let X = (X 1 , . . . , X p ) T denote a vector of p treatments and Y an outcome. We are interested in the effects of X on Y , which may be confounded by a vector of q unobserved covariates U . The dimension of the confounder, q, is assumed to be known a priori; for choice of q in practice see illustrations in Sections 6-7 and the discussion in Section 8. For notational convenience, we suppress observed covariates, and all conditions and results can be viewed as conditioning on them. Hereafter, we use Σ A to denote the covariance matrix of a random vector A. We use f to denote a generic probability density or mass function and f (A = a | B = b) the conditional density/mass of A given B evaluated at (A = a, B = b), and write f (a | b) for simplicity. Vectors are assumed to be column vectors unless explicitly transposed. We refer to f (x, u) as the treatment-confounder distribution and f (y | u, x) the outcome model. Let Y (x) denote the potential outcome that would have been observed had the treatment X been set to x. Treatment effects are defined by contrasts of potential outcomes between different treatment conditions, and thus we focus on identification of f {Y (x)}. We say that f {Y (x)} is identified if and only if it is uniquely determined by the joint distribution of observed variables. Throughout we make three standard identifying assumptions. Consistency states that the observed outcome is a realization of the potential outcome under the treatment actually received. Ignorability, also called "exchangeability," ensures that treatment assignments are effectively randomized conditional on U and implies that U suffices to control for all confounding. Positivity, also called "overlap," ensures that for all values of U all treatment values have positive probability. If we were able to observe the confounder U , Assumption 1 would permit fully nonparametric identification of f {Y (x)} by the back-door formula (Pearl, 1995) or the g-formula (Robins, 1986) , See Rosenbaum and Rubin (1983b) and Hernán and Robins (2020) for an overview of nonparametric identification of treatment effects under these standard assumptions when U is observed. But when U is not observed, all information contained in the observed data is captured by f (y, x), from which one cannot uniquely determine the joint distribution f (y, x, u). In general the observed data contain no information about unobserved confounding, echoing the classical problem that the joint distribution cannot be determined from the marginal one. To be specific, one has to solve for f (x, u) and However, f (x, u) cannot be uniquely determined from (2), even if, as is common practice, a factor model is imposed on f (x, u); see D'Amour (2019a) for a counterexample. Furthermore, even if f (x, u) is known, the outcome model f (y | u, x) cannot be identified; D'Amour (2019a) points out that lack of identification of f (y | u, x) is due to the unknown copula of f (y | x) and f (u | x). Here we note that identifying f (y | u, x) given f (u | x) is equivalent to solving the integral equation (3), but the solution is not unique without extra assumptions. As a result, one cannot identify the true joint distribution f (y, x, u) that is essential for the gformula (1). We call a joint distributionf (y, x, u) admissible if it conforms to the observed data distribution f (y, x), i.e., f (y, x) = uf (y, x, u)du. The counterexample by D'Amour (2019a) also shows that different admissible joint distributions result in different potential outcome distributions, i.e., the potential outcome distribution is not identified without additional assumptions. Some previous approaches have estimated U directly with a deterministic function of X, but this controverts the positivity assumption and requires an infinite number of treatments in order to consistently estimate U . Furthermore, Grimmer et al. (2020) show that in these settings the effect of X on Y is asymptotically unconfounded and a naive regression of Y on X weakly dominates these more involved approaches. Suppose we have available a vector of auxiliary variables Z, then the observed data distribution is captured by f (x, y, z), from which we aim to identify the potential outcome distribution f {Y (x)}. Let f (x, u | z; α) denote a model for the treatment-confounder distribution indexed by a possibly infinite-dimensional parameter α, and f (x | z; α) the resulting marginal distribution. Given f (x | z; α), we let f (x, u | z;α) denote an arbitrary admissible joint distribution such that f (x | z; α) = u f (x, u | z;α)du, and writẽ f (x, u | z) = f (x, u | z;α) for short. Our identification strategy rests on the following assumption. α} for some invertible but not necessarily known function V ; (iii) Completeness: for any α, f (u | x, z; α) is complete in z, i.e., for any fixed x and square-integrable function g, E{g(U ) | X = x, Z; α} = 0 almost surely if and only if g(U ) = 0 almost surely. The exclusion restriction characterizing auxiliary variables is the same condition invoked for the treatmentinducing confounder proxies by Tchetgen Tchetgen et al. (2020) ; it rules out the existence of a direct causal association between the auxiliary variable and the outcome. It is satisfied by instrumental variables and confounder proxies or negative controls. Figure 1 includes two directed acyclic graph (DAG) examples that satisfy the assumption. In Section 3.3 we will illustrate the difference between our use of auxiliary variables and previous proposals. Equivalence is a high-level assumption stating that the treatment-confounder distribution lies in a model that is identified upon a one-to-one transformation of U . Because ignorability holds conditional on any one-to-one transformation of U , this allows us to use an arbitrary admissible treatment-confounder distribution to identify the treatment effects. The equivalence property restricts the class of treatment-confounder distributions; as one example, it is not met if the dimension of confounders exceeds that of the treatments. Nonetheless, the equivalence property admits a large class of models. In particular, it allows for any factor model or mixture model that is identified, where identification in the context of these models does not imply point identification but rather identification up to a rotation (factor models) or up to label switching (mixture models). Such model assumptions are often used in bioinformatics applications where the unmeasured confounder represents population structure (GWAS) or lab batch effects (Wang et al., 2017; Wang and Blei, 2019; Luo and Wei, 2019) . Identification results for factor and mixture models have been very well established (Anderson and Rubin, 1956; Kuroki and Pearl, 2014; Allman et al., 2009; Titterington et al., 1985; Teicher, 1963; Yakowitz and Spragins, 1968) . A major limitation of factor models is that they are in general not identified when there are single-treatment confounders or when there are causal relationships among the treatments (Ogburn et al., 2019) . However, the equivalence assumption can accommodate models that allow for both of these features. For instance, the equivalence assumption is met for normal mixture models (Teicher, 1963; Yakowitz and Spragins, 1968) , which allow for single-treatment confounders and causally dependent treatments and have been used for confounding adjustment in gene expression studies (Luo and Wei, 2019) ; see the supplement for an example. Completeness is a fundamental concept in statistics (see Lehman and Scheffe (1950) ; Basu (1955) ), and primitive conditions are readily available in the literature, including the fact that it holds for very general exponential families of distributions and for many regression models; see e.g., Powell (2003) and D'Haultfoeuille (2011) . Chen et al. (2014) and Andrews (2017) have shown that if Z and U are continuously distributed and the dimension of Z is larger than that of U , then under a mild regularity condition the completeness condition holds generically in the sense that the set of distributions for which completeness fails has a property analogous to having zero Lebesgue measure. By appealing to such results, completeness holds in a large class of distributions and thus one may argue that it is commonly satisfied. The role of completeness in this paper is analogous to its wide use in a variety of nonparametric and semiparametric identification problems, for instance, in IV regression (Newey and Powell, 2003; Darolles et al., 2011) , IV quantile regression (Chernozhukov and Hansen, 2005; Chen et al., 2014) , measurement error problem (Hu and Schennach, 2008) , missing data (Miao and Tchetgen Tchetgen, 2016; D'Haultfoeuille, 2010) , and proximal inference . Our completeness assumption means that, conditional on X, any variability in U is captured by variability in Z, analogous to the relevance condition in the instrumental variable identification. It is easiest understood in the categorical case. For the binary confounder case, completeness holds if U and Z are correlated within each level of X. When both U and Z have k levels, completeness means that the matrix [f (u i | x, z j )] k×k consisting of the conditional probabilities is invertible. This is stronger than dependence of Z and U given X. Roughly speaking, dependence reveals that variability in U is accompanied by variability in Z, and completeness reinforces that any infinitesimal variability in U is accompanied by variability in Z. As a consequence, completeness in general fails if the number of levels or dimension of Z is smaller than that of U or Z is a coarsening of U . In practice, completeness is more plausible if practitioners measure a rich set of potential auxiliary variables for the purpose of confounding adjustment. In the usual case that the dimension of U is much smaller than that of X, the dimension of Z can also be small. Completeness can be checked in specific models, for instance, in the categorical case. However, Canay et al. (2013) show that for unrestricted models the completeness condition is in fact untestable. In the supplement, we further elaborate the discussion on completeness and provide both positive and negative examples to facilitate its interpretation and use in practice. Proposition 1 formalizes the equivalence and completeness conditions for the linear factor model that is widely used in GWAS and computational biology applications. Proposition 1. Consider a factor model X = αU + ηZ + ε for a vector of p observed variables X, q unobserved confounders U , and r(≥ q) instrumental variables Z such that Z ⊥ ⊥ U ⊥ ⊥ ε. Without loss of generality we let E(U ) = 0, Σ U = I q , E(ε) = 0, and Σ ε be diagonal. Assuming there remain two disjoint submatrices of rank q after deleting any row of α, we have that (i) αα T and Σ ε are uniquely determined from Σ X−ηZ = αα T + Σ ε , and any admissible value for α can be written asα = αR with R an arbitrary q × q orthogonal matrix; (ii) if the components of ε are mutually independent and the joint characteristic function of X does not vanish, then any admissible joint distribution can be written asf (x, u | z) = f (X = x, R T U = u | Z = z; α) with R an arbitrary q × q orthogonal matrix; (iii) if U, Z, and ε T are normal variables and η T γ has full rank of q, The first result follows from Anderson and Rubin (1956) by noting that η is identified by the regression of X on Z, the third result can be obtained from the completeness property of exponential families (Newey and Powell, 2003, Theorem 2. 2), and we prove the second result in the supplement using a lemma of Kotlarski (1967) . The first two results hold without Z, i.e., when η = 0. This proposition demonstrates that, for the linear factor model, any admissible value for α must be some rotation of the true value and any admissible treatment-confounder distribution must be the joint distribution of X and some rotation of U . Proposition 1 requires that p ≥ 2q + 1 and that each confounder is correlated with at least three observed variables, and therefore implies "no single-or dual-treatment confounders." This is stronger than the "no single-treatment confounder" assumption of Wang and Blei (2019) ; however, Grimmer et al. (2020) argue that in fact Wang and Blei (2019) require the much stronger assumption of "no finite-treatment confounding." Leveraging auxiliary variables gives the following identification result. Theorem 1. Under Assumptions 1 and 2, for any admissible joint distributionf (x, u | z) that solves f (x | z) = uf (x, u | z)du, there exists a unique solutionf (y | u, x) to the equation and the potential outcome distribution is identified by Although the equivalence and completeness assumptions impose restrictions on the treatment-confounder distribution f (x, u | z), the outcome model f (y | u, x) is left unrestricted in the sense that the parameter space of f (y | u, x) is all possible conditional densities of Y given X and a q-dimensional confounder U . Theorem 1 depicts three steps of the auxiliary variables approach. First we obtain an arbitrary admissible distributionf (x, u | z); then by solving equation (4) we identifyf (y | u, x), which encodes the treatment effect within each stratum of the confounder; and finally we integrate the stratified effect to obtain the treatment effect in the population. The auxiliary variables approach does not estimate the confounder, or even a surrogate confounder, and thus dispenses with the need for an infinite number of treatments and avoids the forced positivity violations that were described by D'Amour (2019a,b) and Ogburn et al. (2020) . The auxiliary variable is indispensable in the second stage of the approach; without it one has to solve du for the outcome model. The solution to this equation is not unique given f (y | x) andf (u | x). However, by incorporating an auxiliary variable satisfying the exclusion restriction, we obtain equation (4), a Fredholm integral equation of the first kind (Kress, 1989, chapter 15 ). The solution of this equation is unique under the completeness condition and thus identifies the outcome model, up to an invertible transformation of the confounder. Equation (4) also offers testable implications for Assumption 2: if the equation does not have a solution, then Assumption 2 must be partially violated. Unlike the g-formula (1), we do not identify the true outcome model f (y | u, x) or the true con- Nonetheless, for any such admissible pair of outcome model and treatment-confounder distribution, we can still identify the potential outcome distribution, because ignorability holds conditional on any such transformation of U . The equivalence assumption guarantees that any admissible distributionf (x, u | z) can be used for identifying the potential outcome distribution; we do not need to use the truth f (x, u | z) and thus bypass the challenge to identifying it. Although Theorem 1 shows that the potential outcome distribution is identified, the integral equation (4) does not admit an analytic solution in general and one has to resort to numerical methods. For instance, Chae et al. (2019) provide an estimation algorithm that is conjectured to provide a consistent estimator of the unknown function under mild conditions. Nonetheless, in certain special cases, a closed-form identification formula can be derived. Example 1. Suppose p treatments, one confounder, one instrumental variable, and one outcome are generated as X = αU + ηZ + ε and Y = m(X, U, e), where (ε T , U, Z) is a vector of independent normal variables with mean zero, Σ U = 1, m is unknown, and e⊥ ⊥(ε T , U, Z). We require that at least three entries of α are nonzero and that η T γ = 0, in which case, the equivalence and completeness assumptions are met according to Proposition 1. Given an admissible valueα, we letγ be the Fourier transforms of the standard normal density function φ and f (y | x, z) respectively, where i = (−1) 1/2 denotes the imaginary unity. Then the solution to (4) withf (u | x, z) given above is and the potential outcome distribution is Detailed derivation for Example 1 is deferred to the supplement. If a linear outcome model is assumed and the structural causal parameter is of interest, then identification and estimation are simplified as we will discuss in Section 5.2. In the supplement, we include an additional example where identification rests on a confounder proxy variable. We briefly describe the difference between our auxiliary variables approach and the instrumental variable and negative control or proximal inference approaches. In addition to the exclusion restriction (Z ⊥ ⊥ Y | (X, U )), the instrumental variable approach requires additional assumptions to achieve identification, such as an additive outcome model not allowing for interaction of the treatment and the confounder E(Y | u, x) = m(x) + u as well as completeness in z of f (x | z) (Newey and Powell, 2003) . Alternative strands of using IV for confounding adjustment include nonseparable outcome models (Imbens and Newey, 2009; Chernozhukov and Hansen, 2005; Wang and Tchetgen Tchetgen, 2018) and local average treatment effect models (Angrist et al., 1996; Ogburn et al., 2015) . However, these two approaches typically focus on a single (or binary) treatment and we are not aware of any extensions for multiple treatments. Therefore, we compare our approaches to the additive model, which has a straightforward extension to multiple treatments. The In contrast, our approach does not rest on outcome model restrictions and hence accommodates interactions. Moreover, the completeness of f (x | z) in z entails at least as many instrumental variables as there are confounded treatments, and requires each confounded treatment to be correlated with at least one instrumental variable. But for our auxiliary variables approach, completeness of f (u | x, z) in z requires the dimension of Z to be as great as that of U , which can be much smaller than that of the treatments. In the special case of a single confounder, completeness of f (u | x, z) can be more plausibly achieved with only one auxiliary variable and with only one treatment correlated with it. The negative controls or proximal inference approach Shi et al., 2020; Tchetgen Tchetgen et al., 2020) allows for unrestricted outcome models, but entails at least two confounder proxies (W, Z) as well as exclusion restrictions: W ⊥ ⊥ (X, Z) | U and Z ⊥ ⊥ Y | (X, U ), even in the single confounder setting. This approach additionally assumes existence of a function h(w, y, x), called the confounding bridge function, such that f (y | u, x) = w h(w, y, x)f (w | u)dw, i.e., h(w, y, x) suffices to depict the relationship between the confounding on Y and W . The integral equation is solved for the confounding bridge h(w, y, x), and the potential outcome distribution is obtained by A strength of these two approaches is that they leave the treatment-confounder distribution unrestricted. But when the correlation structure of multiple treatments is informative about the presence and nature of confounding, as is generally the case in GWAS and computational biology applications, our method can exploit this correlation structure to remove the confounding bias, while the conventional instrumental variable and proximal inference approaches are agnostic to the treatment-confounder distribution and therefore unable to leverage any information it contains. Without auxiliary variables, we let f (x, u | α) denote a model for the treatment-confounder distribution and f (x; α) the resulting marginal distribution indexed by a possibly infinite-dimensional parameter α. Let varies with x i } denote the indices of confounded treatments that are associated with the confounder and A = {i : f (y | u, x) varies with x i } the active ones that affect the outcome. A key feature of this identification strategy is that the analyst does not need to know which treatments are confounded or which are active. We make the following assumptions. Assumption 3. (i) Null treatments: the cardinality of the intersection C ∩ A does not exceed where |C| is the cardinality of C and must be larger than the dimension of U ; (ii) Equivalence: for any α, anyf ( Analogous to Assumption 2, the equivalence and completeness assumptions restrict the treatmentconfounder distribution, but hold for certain well-known classes of models like factor or mixture models. Under the equivalence assumption, one can identify the confounded treatments set C by using an arbitrary admissible joint distributionf (x, u) without knowing the truth. The null treatments assumption entails that fewer than half of the confounded treatments can have causal effects on the outcome but does not require knowledge of which treatments are active. The assumption is reasonable in many empirical studies where only a few but not many of the treatments can causally affect the outcome. For example, in GWAS or analyzing electronic health record databases for off-label drugs that improve COVID-19 outcomes, one may expect that most treatments are null without knowing which are active treatments. A counterpart of the null treatments assumption was previously considered by Wang et al. (2017) in the context of effects of a treatment on multiple outcomes. They consider a linear factor model Y = αU + βX + ε for p outcomes (Y ) and show that β is identified if only a small proportion of its elements are nonzero. By reversing the roles of treatments and outcome, their approach can be adapted to test but not identify multi-treatment effects, because the coefficient in the regression of treatments on the outcome and the confounder is not the causal effect of interest. Another related concept is the s-sparsity (Kang et al., 2016) used in Mendelian randomization, which assumes that at most s single nucleotide polymorphisms can directly affect the outcome of interest. In contrast, the null treatments assumption does not necessarily require the effects to be sparse and imposes no restrictions on the unconfounded treatments. Leveraging the null treatments assumption gives the following identification result. Theorem 2. Under Assumptions 1 and 3, for any joint distributionf ( there exists a unique solutionf (y | u, x) to the equation and the potential outcome distribution is identified by This theorem states that treatment effects are identified if fewer than half of the confounded treatments can affect the outcome. The outcome model is left unrestricted except for the restriction imposed by the null treatments assumption. Analogous to the auxiliary variables approaches, the equivalence assumption allows us to use an arbitrary admissible treatment-confounder distribution for identification, the null treatments assumption allows us to construct Fredholm integral equations of the first kind to solve for the outcome model, and the completeness assumption guarantees uniqueness of the solution. Without the null treatments assumption, the solution to (6) is not unique as we noted above. Theorem 2 holds if we replace q with an integer s ≥ q in Assumption 3, in which case, completeness is weakened but the null treatments assumption is strengthened. If the average treatment effect is of interest, we could define the active treatments set as If a linear outcome model is assumed and the structural parameter is of interest, the identification and estimation are simplified. We demonstrate this in Section 5.3. To illustrate, the following is a simple example where Assumption 3 holds and identification is achieved. We are not aware of any previous identification results for this setting, in particular when no parametric assumption is imposed on the outcome model. Example 2. Suppose p treatments, one confounder, and one outcome are generated as X = αU + ε and Y = m(X, U, e), where (ε T , U ) is a vector of independent normal variables with mean zero, Σ U = 1, m is unknown, and e ⊥ ⊥ (ε, U ). Suppose the entries of α are nonzero and m can depend on at most (p − 1)/2 treatments, but we do not know which ones. In this setting, Because the entries of α are nonzero, the equivalence assumption is met (Proposition 1 with η = 0) and all entries of γ must be nonzero (lemma 2 in the supplement); thus, f (u | x) is complete in each treatment. As a result, the potential outcome distributions and treatment effects are identified. The null treatments approach proceeds by first obtaining an admissiblef (x, u), then finding the solution to (6), and finally calculating f {Y (x)} from (7). Equation (6) cannot be solved directly as we do not know which treatments are null. As an informal, heuristic intuition for how this identification strategy works in this example, note that, if we knew the identity of any one of the null treatments we could treat it as an auxiliary variable and apply the auxiliary variables approach. In the absence of such knowledge we can imagine using each X as an auxiliary variable; under the null treatments assumption more than half of the X's must give the same, correct solution to (6). We describe a constructive method that formalizes this idea. Let x S denote an arbitrary q-dimensional subvector of x C and xS the rest components of x except for forf (y | u, xS) for all choices of x S . Proposition 2. Under Assumptions 1 and 3, for any choice for x S , if the solution to (8) exists and depends on at most (|C| − q)/2 ones of the confounded treatments, then it must solve (6). An immediate extension of the null treatments approach provides a test of the sharp null hypothesis of no joint effects, which requires neither auxiliary variables nor the null treatments assumption. The sharp null Proposition 3. Under Assumption 1 and (ii)-(iii) of Assumption 3 and given an admissible joint distributioñ f (x, u), if the null hypothesis H 0 is correct, then for any q-dimensional subvector x S of the confounded treatments x C , the solution to the following equation exists and is unique, and the solution must satisfy the following equality, This result allows us to construct valid hypothesis tests even in the absence of auxiliary variables or a commitment to the null treatments assumption. Under Assumption 1 and (ii)-(iii) of Assumption 3, evidence against the existence of the solution to (9) or against the equality (10) is evidence against H 0 . The proof is immediate by noting that, under the null hypothesis H 0 , the null treatments assumption is trivially satisfied. Thus, if H 0 is correct, the solution to (9) does not depend x, and the right hand side of (10) identifying the potential outcome distribution f {Y (x) = y} must be equal to the observed outcome distribution f (Y = y). In this section we adhere to a common principle of causal inference, and indeed statistics more broadly, that is nicely summed up by Cox and Donnelly (2011) (via Imai and Jiang, 2019): If an issue can be addressed nonparametrically then it will often be better to tackle it parametrically; however, if it cannot be resolved nonparametrically then it is usually dangerous to resolve it parametrically. (p. 96) Having proven identification without recourse to parametric outcome models, we will see that estimation does, in fact, require them. This is similar to myriad other causal inference problems, where nonparametric identification is commonly followed by estimation via parametric or sometimes semiparametric models. One key difference is that, in the absence of parametric assumptions, there is no closed-form identifying expression for treatment effects in this setting, because in general there is no closed-form solution to the integral equations that are involved in identification. We first describe an estimation procedure in full generality and then introduce some choices of parametric outcome models that make it feasible in practice. Theorem 1 points to the following auxiliary variables algorithm for estimation of f {Y (x)}. Aux-1 Obtain an arbitrary admissible joint distributionf (x, u, z). Aux-2 Use the estimate from Step 1, along with an estimate of f (y | x, z), to solve Equation (4) for an estimate off (y | u, x). Aux-3 Plug the estimate off (y | u, x) from Step 2 and the estimate off (u) derived from Theorem 2 points to the null treatments algorithm with a similar set of steps for estimation of f {Y (x)}. Null-1 Obtain an arbitrary admissible joint distributionf (x, u). Null-2 Use the estimate off (u | x) from Step 1, along with an estimate of f (y | x), to solve Equation (6) for an estimate off (y | u, x). The constructive method described in Proposition 2 can be implemented to solve (6). Null-3 Plug the estimate off (u) from Step 1 andf (y | u, x) from Step 2 into Equation (7) to Steps Aux-1 and Null-1 require only the equivalence assumption, which places nontrivial restrictions on the treatment-confounder distribution. To estimatef (x, u), one needs to correctly specify a treatmentconfounder model that meets the equivalence assumption, such as a factor or mixture model. Under standard factor or mixture models, estimation off (x, u) is well established, and we refer to the large existing body of literature for estimation techniques and properties (Anderson and Rubin, 1956; Yalcin and Amemiya, 2001; Basilevsky, 2009; Kim and Mueller, 1978; Aitkin and Rubin, 1985; Titterington et al., 1985) . Note that, crucially, Step 1 estimates the distribution of U (joint with X), but does not estimate U itself. This is advantageous as it does not require infinite number of treatments and engender the resulting positivity problems. Steps Aux-2 and Null-2 involve, first, estimating f (y | x) or f (y | x, z), which can be done parametrically or nonparametrically using standard density estimation techniques. More challenging is the second step, solving integral equations (4) and (6), which do not admit analytic solutions in general. These equations are of the form of Fredholm integral equations of the first kind (Kress, 1989, chapter 15) , where, for for the null treatments approach) is the known kernel andf (y | u, x) is an unknown function of u to be solved for. This kind of equation is known to be ill-posed due to noncontinuity of the solution and difficulty of computation. In the contexts of nonparametric instrumental variable regression, regularization methods have been established to solve the equation and we refer to Ai and Chen (2003) , Newey and Powell (2003) , Carrasco et al. (2007) , and Chen and Qiu (2016) for a broad view of this problem. Numerical solution to such equations is an active area of mathematical and statistical research and is largely beyond the scope of this paper. However, we note that Chae et al. (2019) provide R code for a numerical method that is conjectured to provide a consistent estimator of the unknown function under mild conditions. In the next subsections, we describe modeling assumptions under which the integral equation can be avoided altogether. Steps Aux-3 and Null-3 are essentially applications of the g-formula; estimation of this integral is standard in causal inference problems. Below we provide two examples of how parametric models-in this case linear-can obviate the need to solve integral equations, permit estimation using standard software, and admit consistent estimators. Consider the following model for a p-dimensional treatment X, a q-dimensional confounder U , and an r-dimensional instrumental variable Z, with p ≥ 2q + 1 and r ≥ q: there remain two disjoint submatrices of rank q after deleting any row of α, This is the IV setting from Example 1. Intercepts are not included in the models as one can center (X, Y, Z) to have mean zero. If (ε T , U, Z) are normally distributed, then (11)-(12) imply equivalence and (13) implies completeness in Theorem 1, and as a special case, identification and estimation of f {Y (x)} follows from Theorem 1 and the auxiliary variables algorithm, respectively. However, the estimation procedure below works even when the error distributions are left unspecified, in which case (11)-(13) do not suffice for identification of f {Y (x)}. We additionally assume the linear outcome model (14) and focus on the structural parameter β but not the potential outcome distribution f {Y (x)}. Then (11)-(13), viewed as a parallel version of Assumption 2, guarantee identification of β. Condition (13) in principle can be tested after obtaining an estimator of η and γ. Note that, r may be smaller than p, in which case the conventional IV Estimation of β is parallel to the auxiliary variables algorithm. We first obtainη by regression of X on Z and obtainγ by factor analysis of the residuals X −ηZ. There must exist some orthogonal matrix R so thatγ converges to γR. This corresponds to Aux-1. Let (ξ X , ξ Z ) denote the coefficients by regression of Y on (X, Z) and (ξ X ,ξ Z ) be the corresponding estimator. Note that ξ X = β + γδ and ξ Z = −η T γδ, therefore we solveξ for (β,δ). This corresponds to Aux-2: estimation of f (y | x, z) is replaced by a linear regression of Y on (X, Z) and solving the integral equation forf (y | u, x) is replaced by solving linear equations for finite-dimensional parameters (β, δ). We finally obtain β =ξ X +γ(γ TηηTγ ) −1γTηξZ . In the special case that the dimension of the instrumental variable Z equals that of the confounder U , we obtainβ =ξ X +γ(η Tγ ) −1ξZ . Consistency and asymptotic normality follows from that of (ξ X ,ξ Z ,γ,η). Routine R software such as factanal and lm can be implemented for factor analysis and linear regression, respectively, and the variance of estimators can be bootstrapped. Consider the linear models: where U and ε are not necessarily normally distributed. The coefficient β encoding the average treatment effects is of interest. We let γ = Σ −1 X α, which denotes the coefficients of regressing U on X. Hereafter, we use A i to denote the ith row of a matrix or a column vector A. Let C = {i : α i is not a zero vector} denote the indices of confounded treatments, |C| the cardinality of C, and γ C the submatrix consisting of the corresponding rows of γ. We have the following result. Theorem 3. The parameter β is identified under model (16) and the following assumptions: (i) at most (|C| − q)/2 entries of β C are nonzero; (ii) after deleting any row, there remain two disjoint submatrices of α of full rank; (iii) any submatrix of γ C consisting of q rows has full rank. Condition (iii) of Theorem 3 in principle can be tested after obtaining an estimator of γ. Estimation of β is parallel to the null treatments algorithm. Let ξ denote the coefficients by regression of Y on X. Given n independent and identically distributed samples, we first obtainγ by factor analysis of X; this corresponds to Null-1. Then we obtainξ by regression of Y on X, which can be viewed as a crude estimator of β with asymptotic bias γδ. Note that ξ C = β C + γ C δ for confounded treatments, then under the null treatments assumption (i) of Theorem 3, estimation ofδ can be cast as a standard robust linear regression (Maronna et al., 2019; Rousseeuw and Leroy, 2005) givenξ andγ:γ C is the design matrix and ξ C is observations with outliers corresponding to nonzero entries of β C . This corresponds to Null-2, where the complexity of solving integral equations is resolved by robust linear regression. Specifically, given a n 1/2 -consistent estimator (ξ,γ), we solvê which is a least median of squares estimator minimizing median of the squared errors (ξ i −γ i δ) 2 among the confounded treatments consistently selected byĈ. For the robust linear regression, the least quantile of squares, least trimmed squares, and the S-estimator can also be used to solve for δ, which we refer to Rousseeuw and Leroy (2005, chapter 3.4) . The corresponding estimate of β isβ lms =ξ −γδ lms . In the supplement, we show consistency of (δ lms ,β lms ) under the assumptions of Theorem 3 and n 1/2consistency of (ξ,γ). However, they are not necessarily asymptotically normal as we observe in numerical experiments, even if (ξ,γ) are. Therefore, to promote asymptotic normality we useβ lms as an initial value to select the null treatments and to update estimates of δ and β as follows: obtainδ by solvingξ i =γ iδ (by OLS) that corresponds to the smallest (|Ĉ| + q)/2 entries of |β lms | and obtain the ultimate estimator β =ξ −γδ. Routine R software factanal and lqs can be implemented for factor analysis and robust linear regression, respectively, and the variance of the ultimate estimator can be bootstrapped. 6. SIMULATIONS We evaluate performance of the proposed methods via simulations. For the auxiliary variables setting, 2 confounders (U ), 6 instrumental variables (Z), 6 treatments (X), an outcome (Y ), and 2 outcome-inducing confounder proxies (W ) are generated as follows, Under this setting, (X 2 , . . . , X 6 ) are confounded but X 1 is not. For estimation, we consider eight methods: IV 1 conventional IV approach using all six IVs; IV 2 conventional IV approach using five IVs (Z 1 , . . . , Z 5 ) and treating (X 6 , Z 6 ) as covariates; Aux 1 the proposed auxiliary variables approach assuming two factors and using all six IVs; Aux 2 the auxiliary variables approach assuming two factors, using (Z 5 , Z 6 ) as IVs and (Z 1 , . . . , Z 4 ) as covariates; Aux 3 the auxiliary variables approach with one factor, using Z 6 as an IV and (Z 1 , . . . , Z 5 ) as covariates; PI 1 proximal inference using (Z 5 , Z 6 ) and (W 1 , W 2 ) as the treatment-and outcome-inducing confounder proxies, respectively and (Z 1 , . . . , Z 4 ) as covariates; PI 2 proximal inference using Z 6 and W 1 as the treatment-and outcome-inducing confounder proxy, respectively and (Z 1 , . . . , Z 5 ) as covariates; OLS ordinary least squares estimation by regression Y on X and Z. Two stage least squares are used in the above IV or PI methods. Because Grimmer et al. (2020) have shown that the deconfounder (Wang and Blei, 2019) is asymptotically equivalent to and does not outperform OLS, we do not include a separate comparison with the deconfounder. We replicate 1000 simulations at sample size 1000 and 2000. Figure 2 summarizes bias of the estimators of each parameter. As expected, methods IV 1 , Aux 1 , Aux 2 , and PI 1 perform well with little bias for estimation of all parameters, because sufficient number of IVs or confounder proxies are used in these four methods and the number of factors are correctly specified in Aux 1 and Aux 2 . Note that, Aux 2 uses only two IVs while IV 1 uses all six IVs and PI 1 uses two additional confounder proxies. We also compute the bootstrap confidence interval and evaluate the coverage probability for Aux 1 and Aux 2 , summarized in Table 1 . The 95% bootstrap confidence interval has coverage probabilities close to the nominal level of 0.95 under a moderate sample size. When using the same set of IVs, we might expect Aux 1 to be more efficient than IV 1 as the former additionally incorporates the internal dependence structure of the treatments. 1 However, in our simulations when we increase the sample size to 10000 and compute the mean squared error, we observe that Aux 1 has more variability than IV 1 for estimation of (β 3 , β 5 ) and Aux 1 has more variability than Aux 2 for estimation of β 5 . It can also be seen from the boxplots of the bias in Figure 2 . This may be because the proposed auxiliary variables estimation is not efficient. Therefore, in the future it is of great interest to theoretically establish the semiparametric efficiency bound and the efficient estimator under the auxiliary variables assumption and compare to other competing methods. In contrast, IV 2 fails to consistently estimate β 6 because the corresponding IV (Z 6 ) is not correctly used but treated as a covariate, but surprisingly, IV 2 is also biased for estimation of β 1 that is not confounded and can be estimated very well by all the other methods. Likewise, PI 2 is biased for estimation of (β 2 , . . . , β 5 ) due to insufficient number of confounder proxies. Nonetheless, PI 2 has little bias for estimation of β 6 . This is because Z 6 used in PI 2 is a valid IV for X 6 and the PI method is consistent even if the confounder proxies are inadequate, a property previously shown by . Except for β 1 , Aux 3 is biased for estimation of the confounded parameters because the number of factors and IVs are incorrectly specified. As expected, OLS is biased for estimation of all parameters except for β 1 . We also evaluate performance of the estimators when the IVs have a direct effect on the outcome. In this case, the outcome model is changed to Y = β T X + λZ + δ Y U + ε Y with λ = (0.2, . . . , 0.2, 0.3), while the other settings of the data generating process remain the same. Figure 3 summarizes bias of the estimators. All estimators are biased for estimation of the confounded treatment effects, and no estimator outperforms the others in terms of bias. However, the IV estimators are also biased for estimation of β 1 while the other estimators have little bias. In summary, all of conventional IV, proximal inference, and the proposed auxiliary variables approaches rely on the exclusion restriction assumption. If the exclusion restriction and the required conditions are satisfied, each of these approaches can properly address the multi-treatment confounding. The conventional IV entails at least as many IVs as treatments, otherwise it can be biased even for unconfounded treatments; proximal inference needs at least as many treatment-and outcome-inducing proxies as confounders; the proposed auxiliary variables approach does not rest on outcome-inducing confounder proxies but rests on a factor model and correct specification of the factor number. Therefore, to obtain a reliable causal conclusion in practice, we recommend implementing and comparing multiple applicable methods. We generate 2 confounders (U ), 8 treatments (X), and an outcome (Y ) as follows, We consider two choices for β, Case 1: β 1 = β 2 = β 3 = 1, β 4 = . . . = β 8 = 0; Case 2: β 1 = β 2 = β 3 = 1, β 4 = β 5 = 0.2, β 6 = β 7 = β 8 = 0. Under these settings, (X 2 , . . . , X 8 ) are confounded but X 1 is not; the null treatments assumption is satisfied in Case 1 as only two of the confounded treatments are active, but it is violated in Case 2 where β 4 and β 5 also have a small effect on Y . For estimation, three methods are used: the null treatments estimation with correct dimension of the confounder (Null 1 ), the null treatments estimation with one confounder (Null 2 ), and OLS. Because no auxiliary variables are generated, we do not compare to IV, the auxiliary variables, or proximal inference approaches. We replicate 1000 simulations at sample size 2000 and 5000. Figures 4 and 5 summarize bias of the estimators for Case 1 and Case 2, respectively. As expected, for estimation of β 1 that is not confounded, both OLS and the null treatments approach have little bias in both cases even if the number of confounders is not correctly specified. In Case 1 where the null treatments assumption is met, Null 1 has little bias because the number of confounders is correctly specified, and as shown in Table 2 , the 95% bootstrap confidence interval based on Null 1 has coverage probabilities approximate to the nominal level of 0.95. But Null 2 is biased because the dimension of the confounder is specified to be smaller than the truth. The bias is smaller than the OLS, although this is not theoretically guaranteed. If the dimension of the confounder is larger than the truth, the factor analysis fails and so does the null treatments estimation. In Case 2, the null treatments assumption is violated because more than half of the confounded treatments are active. In this case, the null treatments estimation is in general biased for the confounded treatments; the bias could be larger or smaller than OLS. Therefore, to apply the null treatments estimation, one has to first assure that the majority of the treatments are null or very close to zero. Both the auxiliary variables and the null treatments approaches rest on correct specification of number of unmeasured confounders. If it is not known with high confidence, we refer to Bai and Ng (2002) For further illustration, we reanalyze a mouse obesity dataset described by Wang et al. (2006) , where the effect of gene expressions on the body weight of F2 mice is of interest. As argued by Leek et al. (2010) , unmeasured confounding may arise in such gene expression studies due to batch effects or unmeasured phenotypes. The data we use are collected from 227 mice, including the body weight (Y ), 17 gene expressions (X), and 5 single nucleotide polymorphisms (Z); see the supplement for a complete list of these variables. As previously selected by Lin et al. (2015) , the 17 genes are likely to affect mouse weight and the 5 single nucleotide polymorphisms are potential instrumental variables. We further evaluate the effects of these genes on mouse weight by adopting a factor model that is widely used to characterize the unmeasured confounding in gene expression studies (Gagnon-Bartsch and Speed, 2012; Wang et al., 2017) . We assume a linear outcome model and estimate the parameters with three methods: the auxiliary variables approach using single nucleotide polymorphisms as instrumental variables, the null treatments approach assuming that fewer than half of genes can affect the mouse weight, and ordinary least squares. We also compute the bootstrap confidence intervals. that Igfbp2 (Insulin-like growth factor binding protein 2) protects against the development of obesity (Wheatcroft et al., 2007) ; Gpld1 (Phosphatidylinositol-glycan-specific phospholipase D) is associated with the change in insulin sensitivity in response to the low-fat diet (Gray et al., 2008) ; and Irx3 (Iroquois homebox gene 3) is associated with lifestyle changes and plays a crucial role in determining weight via energy balance (Schneeberger, 2019) . However, the significance test (Kim and Mueller, 1978, chapter IV) of the hypothesis that one factor is sufficient is rejected, indicating that either the number of factors is too small or there exists model misspecification. When the number of factors is increased to two, as show in Figure 7 , Gstm2, 2010002N04Rik, Igfbp2, and Avpr1a remain significant in the auxiliary variables analysis, and Gstm2 and Dscam remain significant in the null treatments analysis. When the number of factors is increased to three, all estimates in both the auxiliary variables and the null treatments analyses are no longer significant. In summary, the association between the 17 genes and mice obesity can be explained by three or more unmeasured confounders. But if there exist only one or two confounders, Gstm2, Sirpa, 2010002N04Rik, Dscam, Igfbp2, Avpr1a, Abca8a, Irx3, and Gpld1 have a potential causal association with mouse obesity, which can not be completely attributed to confounding. In future studies, it is of interest to identify the potential confounders and to use additional data to more confidently estimate effects of these 9 genes. In this paper, we extend results that had previously been developed for the identification of treatment effects in the presence of unmeasured confounding in the single-treatment to the multi-treatment setting, and we extend the parametric approach of Wang et al. (2017) to identification of multi-treatment effects with an unrestricted outcome model. We demonstrate a novel framework using integral equations and completeness conditions to establish identification in causal inference problems. We have assumed that the number of confounders is known, which is realistic in confounder measurement error or misclassification problems. When it is not known a priori, consistent estimation of the number of confounders has been well established by Bai and Ng (2002) under factor models and by Chen et al. (2012) under mixture models. The R software factanal provides a significance test of whether the number of factors in a factor model is sufficient to capture the full dimensionality of the data set. We also refer to Owen et al. (2016) and Our identification framework rests on the auxiliary variables or the null treatments assumption. These assumptions are partially testable: a heuristic approach, taking equation (4) as an example, is to check whether a solution exists. This can be achieved by obtaining a solutionf (y | u, x) that minimizes the mean squared error ||f (y | x, z) − uf (y | u, x)f (u | x, z)du|| 2 and checking how far away the error is from zero to assess how well the solution fits the equation. This is a typical goodness-of-fit test if all models are parametric. However, in nonparametric models statistical inference for the integral equation is quite difficult and we leave the development of falsification tests for future research. Even if both the auxiliary variables and the null treatments assumptions fail to hold, we describe how to test whether treatment effects exist. The proposed estimation strategies can also be used to test whether unmeasured confounding is present, by assessing how far the proposed estimates are from the crude ones. Our identification results lead to feasible estimation methods under parametric estimation assumptions. The proposed estimation methods, comprised of standard factor analysis, linear and robust linear regression, inherit properties from the classical theory of statistical inference (e.g. Anderson and Rubin, 1956; Maronna et al., 2019; Newey and McFadden, 1994) ; these methods work well in simulations and an application. However, statistical inference for nonparametric and semiparametric models remains to be studied. We have considered fixed dimensions of treatments and confounders, and extensions to large and high-dimensional settings are of both theoretical and practical interest. additional results for the application. Note that η can be identified by regression of X on Z, then applying lemma 5.1 and theorem 5.1 of Anderson and Rubin (1956) to the factor model for the residuals, we obtain (i) of Proposition 1. The third result of Proposition 1 can be obtained from the well-known completeness property of exponential families, see Theorem 2.2 of Newey and Powell (2003) . Here we prove (ii), which rests on the following lemma described by Kotlarski (1967 , lemma 1 and remark 5). Lemma 1 (Kotlarski, 1967) . Let U , ε 1 , and ε 2 be three independent q-dimensional real random vectors with mean zero, and let W 1 = U + ε 1 and W 2 = U + ε 2 . If the joint characteristic function of (W 1 , W 2 ) does not vanish, then the distributions of U , ε 1 , and ε 2 are uniquely determined from the joint distribution of (W 1 , W 2 ). We apply Kotlarski's lemma to prove (ii) of Proposition 1. Proof of Proposition 1 (ii). We denote W = X − ηZ = αU + ε. Note that from (i) of proposition 1, any admissible value for α can be written asα = αR with R an arbitrary q × q orthogonal matrix, we only need to prove that givenα = αR, the joint distributionf (w, u) = f (W = w, U = u;α) is uniquely determined andf (w, u) = f (W = w, R T U = u; α). Because after deleting any row of α there remain two full-rank submatrices of α, there must exist two disjoint square submatrices of α with full rank q. Note thatα = αR, there must exist two disjoint square submatrices ofα with full rank q, which we denote byα I andα J with I and J denoting the corresponding indices, respectively. Note that W I =α I V + ε I and W J =α J V + ε J with V = R T U , we havẽ According to Lemma 1, the distributions of V , α −1 I ε I , andα −1 J ε J are uniquely determined givenα, and therefore, the distribution of ε = W −αV is uniquely determined. As a result, givenα, there is only one admissible joint distribution, which must bẽ Proof. Under the equivalence (Assumption 2 (ii)), given any admissible joint distributionf (x, u | z), there is complete in z, the solution to (S.2) is unique; this is because for any candidate solutionsf 1 (y | u, x) andf 2 (y | u, x) to (S.2), we must have that u {f 1 (y | u, x) −f 2 (y | u, x)}f (u | x, z) = 0, which implies thatf 1 (y | u, x) =f 2 (y | u, x) by the completeness off (u | x, z) in z. Thus,f (y | u, x) is uniquely determined from (S.2), and f {Y (x)} is identified by plugging in it into (S.1). Proof. Under the equivalence (Assumption 3 (ii)), for any admissible joint distributionf (x, u) we must varies with x i }, then we must have C =C by noting that e., C can be identified from any admissible joint distributionf (x, u). Because V (U ) is invertible, the ignorability assumption 1 (Y (x) ⊥ ⊥ X | U ) implies that Y (x) ⊥ ⊥ X | V (U ), and the completeness (Assumption 3 (iii)) implies thatf (u | x) is also complete in x S for any S ⊂ C with cardinality q. Lettingf (y | u, x) = f {y | V (U ) = u, x}, then we have that We prove thatf (y | u, x) is uniquely determined from (S.4) given f (y | x) andf (u | x) by way of contradiction. Suppose two candidate outcome modelsf 1 (y | u, x) andf 2 (y | u, x) satisfy (S.4), then Under the null treatments assumption, each off 1 (y | u, x) and f 2 (y | u, x) can depend on only (|C| − q)/2 confounded treatments, and thus the contrast {f 1 (y | u, x) − f 2 (y | u, x)} can depend on at most |C| − q confounded treatments. We let X S denote the rest q confounded treatments that the contrast {f 1 (y | u, x) −f 2 (y | u, x)} does not depend on, then the completeness (Assumption 3(iii)) implies thatf (u | x S , xS) is complete in x S , and thus {f 1 (y | u, x) −f 2 (y | u, x)} = 0 almost surely, i.e.,f 1 (y | u, x) =f 2 (y | u, x) almost surely. Therefore, the solution to (S.4) must be unique. Finally, plugging inf (y | u, x) andf (u) into (S.3) identifies the potential outcome distribution. Proof. We first note that the confounded treatments can be identified under the equivalence assumption, by the argument in the proof of Theorem 2. Note that the candidate solutions depending on more than (|C| − q)/2 confounded treatments contradict the null treatments assumption that at most (|C| − q)/2 confounded ones can affect the outcome, we only focus on solutions that depends on at most (|C| − q)/2 confounded treatments. We let A C denotes the number of active ones of the confounded treatments. Consider a solutionf (y | u, x B ) that solves where |B ∩ C| ≤ (|C| − q)/2, i.e., x B includes at most (|C| − q)/2 confounded treatments. Equation (S.5) is a Fredholm integral equation of the first kind with the kernelf (u | x) complete in xB, where xB denotes the remaining treatments of x except for x B . For x B that includes all active treatments, i.e., x A ⊂ x B and |B ∩ C| = A C , the solution to (S.5) exists and must be unique and equal tof (y | u, x A ). For x B that includes t < A C active ones of the confounded treatments, i.e., |B ∩ C| = t < A C , the solution to (S.5) does not exist. We prove this by way of contradiction. Suppose (S.5) has a solutionf (y | u, x B ), then it must also satisfy the following equation where the unknown functionf (y | u, x B ∪ x A ) of u is allowed to depend on all active treatments. Note that can depend on at most (|C| − q)/2 − t null ones of the confounded treatments, Equation i.e., the remaining q + t null ones of the confounded treatments. Therefore, (S.6) can be satisfied by only one function, which in fact isf (y | u, x A ). This contradicts thatf (y | u, x B ) depend on only t < A C active ones of the confounded treatments. As a result, all solutions that depend on at most (|C| − q)/2 confounded treatments must be equal tõ f (y | u, x A ), i.e., the solution to (6). We first describe a lemma that is useful for proof of Theorem 3. Lemma 2. For a p × p positive-definite matrix Σ ε and a p × q matrix α of full column rank with p > q, Proof. Letting A = Σ −1/2 ε α and B = Σ 1/2 ε γ, then A has full rank, and it is straightforward to verify that Thus, we have that In the special case that Σ ε is diagonal and q = 1, we have We then prove Theorem 3. Proof of Theorem 3. Under model (16) and condition (i) of Theorem 3, Σ ε is identified and any admissible valueα is a rotation of the truth (Proposition 1), i.e.,α = αR for some q × q orthogonal matrix R. We let γ = Σ −1 X α denote the coefficient by linear regression of U on X. Given an admissible valueα = αR, we let γ = Σ −1 Xα = γR andδ = R T δ. We let C = {i : α i is not a zero vector} denote the confounded treatments, where α i is the ith row of α. Note that Σ ε is diagonal under model (16), then according to Lemma 2, we haveγ is a zero vector if and only if α i is a zero vector, i.e., C is identified by the set {i :γ i is not a zero vector}. Letting ξ denote the ordinary least squares coefficient by regressing Y on X, then we have By way of contradiction, we prove that (β,δ) is uniquely determined from this equation given (β,γ) and under the null treatments assumption. Suppose that two sets of values (β (1) ,δ (1) ) and (β (2) ,δ (2) ) satisfy (S.7), and both β (1) C and β (2) C satisfies the null treatments assumption that at most (|C| − q)/2 entries are nonzero. For unconfounded treatments, the corresponding rows ofα andγ must be zero as we show in the above, thus β (1) C =βC. We remain to prove that β (1) C andδ (1) =δ (2) . From (S.7), we have that On the left hand side of this equation, β (1) C has at least q zero entries under assumption (i) of Theorem 3. We use Z to denote the indices of zero entries of β (1) C andγ Z the corresponding submatrix ofγ C , then we have that 0 =γ Z (δ (2) −δ (1) ). Note thatγ Z = γ Z R must have full rank of q under assumption (iii) of Theorem 3, then we must haveδ (2) =δ (1) , and thus β (1) C . In summary, β (1) = β (2) , i.e., β is uniquely determined. Completeness is a fundamental concept in statistics (see Lehman and Scheffe (1950) ; Basu (1955) ), which is taught in most foundational courses of statistical inference. It has been used to establish the theory for hypothesis testing and unbiased estimation in mathematical statistics (Lehman and Scheffe, 1950) , and recently been used to establish identification in causal inference, missing data, and measurement error problems. Nonetheless, it may still be abstract to practitioners. Therefore, it is worth explaining in more detail. We add further explanation and extra examples to facilitate the interpretation and use of the completeness condition in practice. In particular, we illustrate completeness from the following perspectives. • The role of completeness in identification. Since its prevalent use in statistics, completeness has been widely used to establish identification for a variety of nonparametric and semiparametric models, for instance, the IV regression model (Newey and Powell, 2003; Darolles et al., 2011) , IV quantile model (Chernozhukov and Hansen, 2005; Chen et al., 2014) , measurement error model (Hu and Schennach, 2008) , missing data model (Miao and Tchetgen Tchetgen, 2016; D'Haultfoeuille, 2010) , and proximal inference . It has been very well studied by statisticians and economists, prim- where the outcome model to be identified is parametric. In the categorical case where both U and Z have k levels, completeness means that the matrix [f (u i | x, z j )] consisting of the conditional probabilities is invertible for any x. This is stronger than dependence of Z and U given X. Roughly speaking, dependence reveals that variability in U is accompanied by variability in Z, and completeness reinforces that any infinitesimal variability in U is accompanied by variability in Z. For instance, if Z is a proxy of U , completeness of f (u | x, z) can be interpreted as no coarsening in the mea-surement Z of the confounder U . As a consequence, completeness fails if the number of levels or dimension of Z is smaller than that of U . For the binary case, completeness holds if U and Z are correlated within each level of X. In the linear model E(U | x, Z) = γ 0 (x) + γ 1 (x)Z, completeness reduces to a rank condition that γ 1 (x) has full row rank for all x. The rank condition can only hold if the dimension of Z is no smaller than that of U . This argument provides a rationale for measuring a rich set of potential auxiliary variables for the purpose of confounding adjustment. However, if the outcome model is unrestricted, completeness serves as a generic rank condition accommodating both categorical and continuous variables, linear and nonlinear models, although, it can no longer be expressed so concisely as a full rank condition. • How to assess or test completeness. Completeness can be checked in specific models, for instance, one can check whether the covariance matrix is of full rank in the joint normal model. Unfortunately, Canay et al. (2013) show that for unrestricted models the completeness condition is in fact untestable, even if all relevant variables (X, Z, U in our problem) are observed. Therefore, without restrictions on the distribution, it is impossible to provide empirical evidence in favor of the completeness condition, akin to the ignorability assumption. • When does completeness hold or fail, and is it a stringent condition? A number of papers (Andrews, 2017; D'Haultfoeuille, 2011; Newey and Powell, 2003; Darolles et al., 2011; Chen et al., 2014; Hu and Shiu, 2018) have established genericity results for parametric, semiparametric, and nonparametric distributions satisfying completeness. Andrews (2017) has shown that if Z and U are continuously distributed and the dimension of Z is larger than that of U , then under a mild regularity condition the completeness condition holds generically in the sense that the set of distributions or conditional expectation operators for which completeness fails has a property analogous to having zero Lebesgue measure (Chen et al., 2014; Andrews, 2017) . By appealing to such results, completeness holds in a large class of distributions and thus one may argue that it is commonly satisfied. In short, completeness is one of the most general conditions made in problems of identification. It requires that Z must have sufficient dimensions or levels and variability relative to U . Commonly-used parametric and semiparametric models, such as exponential families (Newey and Powell, 2003, Theorem 2 .2) and location-scale families (Mattner, 1992; Hu and Shiu, 2018) , and nonparametric additive models (D'Haultfoeuille, 2011) satisfy the completeness condition. For nonparametric models, it is not testable but holds in a large class of models. In the following, we provide extra examples illustrating completeness, see also Lehman and Scheffe (1950) for a variety of parametric examples where completeness holds and also counterexamples. We also refer to Newey and Powell (2003) for completeness of exponential families, Hu and Shiu (2018) for locationscale families, and D'Haultfoeuille (2011); Darolles et al. (2011) for additive separable regression models. Example S.1. The binary case. Suppose both Z and U are binary, then for any x completeness of f (u | x, z) holds as long as U ⊥ ⊥ Z | X = x, but otherwise completeness fails if U ⊥ ⊥ Z | X = x. Example S.2. The categorical case. Suppose U has q levels and Z has r levels, then for a given x completeness of f (u | x, z) in z holds as long as the matrix : . . . : consisting of the conditional probabilities has full row rank. Therefore, it is necessary that q ≤ r and U ⊥ ⊥ Z | X = x. Otherwise, completeness fails if either q > r or U ⊥ ⊥ Z | X = x. However for q > 2, the full rank condition is stronger than the dependence (U ⊥ ⊥ Z | X = x). For instance, if f (u 1 | z, x) = f (u 2 | z, x) and f (u 3 | z, x) = f (u 1 | z, x) for all z, then U ⊥ ⊥ Z | X = x but the full rank condition is obviously not met. This is because the variability in U from u 1 to u 3 is not sufficiently captured by Z, i.e., the measure of Z is coarsened if we view it as a proxy of U . Roughly speaking, dependence reveals that variability in U is accompanied by variability in Z, and completeness reinforces that any infinitesimal variability in U is accompanied by variability in Z. Example S.3. Gaussian distributions. Suppose U and Z have dimensions of q and r, respectively, and f (u, z | x) is joint normal given x, then completeness of f (u | x, z) in z reduces to a rank condition: the completeness fails for f (u | x, z) ∼ N (0, σ 2 x,z ). This is because the conditional density is an even function of u and E{g(U ) | x, z} = 0 for any square-integrable and odd function g. In this example, the scale or magnitude of variability of U is captured by Z but not the orientation. S.3. CONSISTENCY OF THE LEAST MEDIAN OF SQUARES ESTIMATOR (δ lms ,β lms ) We show consistency of (δ lms ,β lms ) under the assumptions of Theorem 3 and n 1/2 -consistency of (ξ,γ), i.e., n 1/2 (ξ − ξ) and n 1/2 (γ − γR) are bounded in probability for some unknown orthogonal matrix R. We show thatδ lms → Rδ andβ lms → β. For notational simplicity, we only consider the special case where R is the identity matrix. For general cases with R unknown, the following proof holds by simply replacing δ with Rδ and γ with γR. Because n 1/2 (γ − γ) is bounded in probability, then ||γ i || 2 2 → ||γ i || 2 2 > log(n)/n for γ i = 0 and n/ log(n)||γ i || 2 2 → 0 for γ i = 0. Lemma 2 implies C = {i : ||α i || 2 2 > 0} = {i : ||γ i || 2 2 > 0} and therefore, f (Ĉ = C) → 0, i.e.,Ĉ consistently selects the confounded treatments. Letting we only need to show consistency ofδ lms becauseδ lms =δ lms uponĈ = C. Note that for any δ For sufficiently large sample size n,ξ i − ξ i andγ i − γ i are close to zero so that (ξ i − ξ i − (γ i − γ i )δ) 2 < (ξ j − ξ j − (γ j − γ j )δ + β j ) 2 for any i with β i = 0 and any j with β j = 0. Assumption (i) of Theorem 3 states that more than half entries of β are zero, and thus, median{ attained among the null treatments. Therefore, we have asymptotically Hence, Letting ∆ =δ lms − δ, we have note that more than half entries of β are zero, Therefore, we have Assuming that (ξ,γ) are consistent, then the right hand side must converge to zero and thus γ i ∆ → 0 for all i ∈ C. Moreover, γ C has full column rank (Assumption (iii) of Theorem 3), then ∆ must converge to zero, i.e.,δ lms is consistent and as a resultδ lms is consistent. Consistency ofβ lms =ξ −γδ lms follows from consistency of (ξ,γ,δ lms ). Without assist of auxiliary variables and null treatments assumptions, identification is not generally available and depends on specific model assumptions. In a recent note, Kong et al. (2021) consider a binary outcome model with one confounder. Under a factor model for normally distributed treatments and a couple of assumptions such as knowing the sign of confounding bias, they prove identification via a meticulous analysis of the link distribution. However, their identification results do not generalize to the multivariate confounder case as illustrated by the following counterexample. Example S.5. Assuming that where U is a q-dimensional confounder, Σ ε is diagonal, and G is a known distribution function relating the outcome mean to a linear model of the treatments and confounder. The unknown parameters (β, δ) capture the treatment effects and the magnitude of confounding, respectively. Under this setting, one can verify that the observed data distribution f (x, y) is satisfied withα = αR 1 , δ = R T 1 Σ −1/2 R 2 Σ 1/2 δ, andβ = β + γ(I q − Σ −1/2 R 2 Σ 1/2 )δ, where Σ = I q − α T Σ −1 X α and R 1 , R 2 are arbitrary q × q orthogonal matrices. In the special case where U is univariate, i.e., q = 1, there are only two possible values −1 and 1 for orthogonal matrices R 1 , R 2 ; thus, there are at least two possible values for the treatment effect,β = β and β = β + 2γδ. If further the signs of δ and at least one entry of α are known, i.e., R 1 = R 2 = 1, theñ β = β + 2γδ can be excluded, and in fact, Kong et al. (2021) have shown thatβ = β is the only possible value for the treatment effect provided that G is not a normal distribution. However, this argument does not generalize to the multivariate confounder case, because there are infinite number of orthogonal matrices with dimension q ≥ 2, in which case, it is impossible to specify R 1 , R 2 . Note that η can be identified by regression of X on Z. Given η, an arbitrary admissible valueα, and φ is the probability density function of N (0, 1), (S.10) for h(y, x, u), which is the outcome modelf (y | x, u). Following the procedure described by , h(y, x, u) can be represented in Fourier transforms off (u | x, z) and f (y | x, z). By substitution z = {γ T x −γ T ηz}/σ, u = u/σ, and by letting which is an integral equation of convolution type and can be solved by applying the Fourier transform. Letting h 1 and h 2 denote the Fourier transforms of φ and g respectively: with i = (−1) 1/2 the imaginary unity, we have h 2 (y, x, t) = h 1 (t) × For simplicity, we consider a binary confounder, p binary treatments, and a binary proxy Z of the confounder. We assume that X 1 ⊥ ⊥ · · · ⊥ ⊥ X p | U, (S.12) at least three treatments are correlated with U ; (S.13) Z ⊥ ⊥ U, Z ⊥ ⊥ (X, Y ) | U ; (S.14) where the last independence is known as the nondifferentially error assumption (Ogburn and VanderWeele, 2013; Carroll et al., 2006) . Under (S.12)-(S.14), completeness of f (u | x, z) in z holds as long as Z is correlated with U , because E{g(U ) | x, z} = 0 ⇔ u g(u)f (x, u)f (z | u) = 0 ⇔ g(u)f (x, u) = 0 ⇔ g(u) = 0. According to Kuroki and Pearl (2014) , under (S.12)-(S.13), any admissible joint distributionf (x, u) equals the joint distribution of X and some label switching of U . Givenf (x, u), we solve f (z, x) = uf (z | u)f (x, u) to obtainf (z | u) andf (x, z, u) =f (x, u)f (z | u). We then obtainf (y | u, x) by solving f (y | x, z) = uf (y | u, x)f (u | x, z), and finally the potential outcome distribution is identified by f {Y (x) = y} = uf (y | u, x)f (u). The identification result can be generalized to the categorical setting, and we refer to Kuroki and Pearl (2014) for details of factor analysis in this case. S.5.3 A normal mixture model that satisfies the equivalence assumption Example S.6. (Yakowitz and Spragins, 1968, Proposition 2) . Suppose U has q categories with f (U = u i ) = π i and X is a p dimensional vector with f (x | u i ) ∼ N (µ i , Σ i ). Assuming that the pairs (µ i , Σ i ) are all distinct, then f (x) has a unique representation in normal mixtures f (x) ∼ q i=1 p i N (µ i , Σ i ): we must have q = q and for each i there must exist some j such that π i = π j and (µ i , Σ i ) = (µ j , Σ j ). That is, the equivalence holds and f (x, u) is identified up to a label switching of the confounder. Bootstrap percentiles: 2.5%, 97.5%, 5%, 95%; Significance codes: "**" for significant at level of 0.05, "*" for 0.1, and "0" for not significant at level 0.1. Estimation with one confounder Efficient estimation of models with conditional moment restrictions containing unknown functions Estimation and hypothesis testing in finite mixture models Identifiability of parameters in latent structure models with many observed variables Singular value decomposition for genome-wide expression data processing and modeling Statistical inference in factor analysis Examples of L 2 -complete and boundedly-complete distributions Identification of causal effects using instrumental variables The blessings of multiple causes Determining the number of factors in approximate factor models The paired availability design: A proposal for evaluating epidural analgesia during labor Bounds on treatment effects from studies with imperfect compliance Statistical Factor Analysis and Related Methods: Theory and Applications On statistics independent of a complete sufficient statistic Bounds on direct effects in the presence of confounded intermediate variables On the testability of identification in some nonparametric models with endogeneity Linear inverse problems in structural econometrics estimation based on spectral decomposition and regularization Measurement Error in Nonlinear Models: A Modern Perspective Spectral deconfounding via perturbed sparse linear models On an algorithm for solving Fredholm integrals of the first kind Inference on the order of a normal mixture Local identification of nonparametric and semiparametric models Methods for nonparametric and semiparametric regressions with endogeneity: A gentle guide An IV model of quantile treatment effects A lava attack on the recovery of sums of dense and sparse signals Smoking and lung cancer: Recent evidence and a discussion of some questions Principles of Applied Statistics On multi-cause causal inference with unobserved confounding: Counterexamples, impossibility, and alternatives Comment: Reflections on the deconfounder Nonparametric instrumental regression Proxy variables and nonparametric identification of causal effects A new instrumental method for dealing with endogenous selection On the completeness condition in nonparametric instrumental problems Mendelian randomization as an instrumental variable approach to causal inference Generalized Cornfield conditions for the risk difference Sensitivity analysis without assumptions A new method for partial correction of residual confounding in time-series and other observational studies A factor model approach to multiple testing under dependence Using control genes to correct for unwanted variation in microarray data Structural equation methods in the social sciences Plasma glycosylphosphatidylinositol-specific phospholipase D predicts the change in insulin sensitivity in response to a low-fat but not a low-carbohydrate diet in obese women Naive regression requires weaker assumptions than factor models to adjust for multiple cause confounding Doubly debiased lasso: High-Dimensional inference under hidden confounding and measurement errors Causal Inference: What If Instrumental variable treatment of nonclassical measurement error models Nonparametric identification using instrumental variables: Sufficient conditions for completeness Comment: The challenges of multiple causes Identification and estimation of triangular simultaneous equations models without additivity Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization Factor Analysis: Statistical Methods and Practical Issues Multi-cause causal inference with unmeasured confounding and binary outcome On characterizing the gamma and the normal distribution Linear Integral Equations Measurement bias and effect restoration in causal inference Tackling the widespread and critical impact of batch effects in high-throughput data Capturing heterogeneity in gene expression studies by surrogate variable analysis A general framework for multiple testing dependence Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics Negative controls: A tool for detecting confounding and bias in observational studies Batch effects correction with unknown subtypes Nonparametric bounds on treatment effects Robust Statistics: Theory and Methods (with R) Completeness of location families, translated moments, and uniqueness of charges Finite mixture models Identifying causal effects with proxy variables of an unmeasured confounder A Confounding Bridge Approach for Double Negative Control Inference on Causal Effects On varieties of doubly robust estimators under missingness not at random with a shadow variable Large sample estimation and hypothesis testing Instrumental variable estimation of nonparametric models Doubly robust estimation of the local average treatment effect curve Comment on "Blessings of Multiple Causes Counterexamples to "The Blessings of Multiple Causes On the nondifferential misclassification of a binary confounder Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders Bi-cross-validation for factor analysis Causal diagrams for empirical research Principal components analysis corrects for stratification in genome-wide association studies ACE bounds; SEMs with equilibrium conditions A new approach to causal inference in mortality studies with a sustained exposure periodapplication to control of the healthy worker survivor effect Correcting for non-compliance in randomized trials using structural nested mean models Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome The central role of the propensity score in observational studies for causal effects Robust Regression and Outlier Detection Irx3, a new leader on obesity genetics Multiply robust causal inference with double negative control adjustment for categorical unmeasured confounding Bounds on natural direct effects in the presence of confounded intermediate variables Instrumental variable estimation with a stochastic monotonicity assumption Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data An introduction to proximal causal learning Identifiability of finite mixtures Statistical Analysis of Finite Mixture Distributions Confounder adjustment in multiple hypothesis testing Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity The blessings of multiple causes Towards clarifying the theory of the deconfounder IGF-binding protein-2 protects against the development of obesity and insulin resistance Tariff on Animal and Vegetable Oils On the identifiability of finite mixtures Nonlinear factor analysis as a statistical method We are grateful for comments from Eric Tchetgen Tchetgen, the editors, and three anonymous reviewers. Supplementary material online includes proof of theorems and propositions, useful lemmas, discussion and examples on the completeness condition, consistency of the least median of squares estimator, discussion on identification of a parametric model for a binary outcome, details for examples, and additional results for the application. Codes for replicating the results are available at https://www.math.pku.edu.cn/ teachers/mwfy/Attachments/MultipletreatmentsCodes.zip.