key: cord-0655768-aa2xhbhh authors: Arias-Castro, Ery; Jiang, He title: Extending the Patra-Sen Approach to Estimating the Background Component in a Two-Component Mixture Model date: 2021-06-26 journal: nan DOI: nan sha: 3ff5690fd90347a14ddd92208ee39d5f91f73820 doc_id: 655768 cord_uid: aa2xhbhh Patra and Sen (2016) consider a two-component mixture model, where one component plays the role of background while the other plays the role of signal, and propose to estimate the background component by simply"maximizing"its weight. While in their work the background component is a completely known distribution, we extend their approach here to three emblematic settings: when the background distribution is symmetric; when it is monotonic; and when it is log-concave. In each setting, we derive estimators for the background component, establish consistency, and provide a confidence band. While the estimation of a background component is straightforward when it is taken to be symmetric or monotonic, when it is log-concave its estimation requires the computation of a largest concave minorant, which we implement using sequential quadratic programming. Compared to existing methods, our method has the advantage of requiring much less prior knowledge on the background component, and is thus less prone to model misspecification. We illustrate this methodology on a number of synthetic and real datasets. 1 Introduction 1.1 Two component mixture models Among mixture models, two-component models play a special role. In robust statistics, they are used to model contamination, with the main component representing the inlier distribution, while the remaining component representing the outlier distribution (Hettmansperger and McKean, 2010; Huber, 1964; Huber and Ronchetti, 2009; Tukey, 1960) . In that kind of setting, the contamination is a nuisance and the goal is to study how it impacts certain methods for estimation or testing, and also to design alternative methods that behave comparatively better in the presence of contamination. In multiple testing, the background distribution plays the role of the distribution assumed (in a simplified framework) to be common to all test statistics under their respective null hypotheses, while the remaining component plays the role of the distribution assumed of the test statistics under their respective alternative hypotheses (Efron et al., 2001; Genovese and Wasserman, 2002) . In an ideal situation where the p-values can be computed exactly and are uniformly distributed on [0, 1] under their respective null hypotheses, the background distribution is the uniform distribution on [0, 1]. Compared to the contamination perspective, here the situation is in a sense reverse, as we are keenly interested in the component other than the background component. We adopt this multiple testing perspective in the present work. Working within the multiple testing framework, Patra and Sen (2016) posed the problem of estimating the background component as follows. They operated under the assumption that the background distribution is completely known -a natural choice in many pracical situations, see for example the first two situations in Section 2.3. Given a density f representing the density of all the test statistics combined, and letting g 0 denote a completely known density, define Note that θ 0 ∈ [0, 1]. Under some mild assumptions on f , the supremum is attained, so that f can be expressed as the following two-component mixture: for some density u. Patra and Sen (2016) aim at estimating θ 0 defined in (1) based on a sample from the density f , and implement a slightly modified plug-in approach. Even in this relatively simple setting where the background density -the completely known density g 0 above -is given, information on θ 0 can help improve inference in a multiple testing situation as shown early on by Storey (2002) , and even earlier by Benjamini and Hochberg (2000) . We find the Patra-Sen approach elegant, and in the present work extend it to settings where the background distribution (also referred to as the null distribution) -not just the background proportion -is unknown. For an approach that has the potential to be broadly applicable, we consider three emblematic settings where the background distribution is in turn assumed to be symmetric (Section 2), monotone (Section 3), or log-concave (Section 4). Each time, we describe the estimator for the background component (proportion and density) that the Patra-Sen approach leads to, and study its consistency and numerical implementation. We also provide a confidence interval for the background proportion and a simultaneous confidence band for the background density. In addition, in the log-concave setting, we provide a way of computing the largest concave minorant. We address the situation where the background is specified incorrectly, and mention other extensions, including combinations of these settings and in multivariate settings, in Section 5. model and approach the problem via the characteristic function -a common approach in deconvolution problems. Gaussian mixtures are also considered in (Cai and Jin, 2010; Efron, 2007 Efron, , 2012 , where the Gaussian component corresponding to the null has unknown parameters that need to be estimated. Some references to estimating the null component that have been or could be applied in the context of multiple testing are given in (Efron, 2012, Ch 5) . Otherwise, we are also aware of the very recent work of Roquain and Verzelen (2020) , where in addition to studying the 'cost' of having to estimate the parameters of the null distribution when assumed Gaussian, also consider the situation where null distribution belongs to a given location family, and further, propose to estimate the null distribution under an upper bound constraint on the proportion of non-nulls in the mixture model. Remark 1. Much more broadly, all this connects with the vast literature on Gaussian mixture models (Cohen, 1967; Lindsay and Basak, 1993 ) and on mixture models in general (Lindsay, 1995; McLachlan and Basford, 1988; McLachlan et al., 2019; McLachlan and Peel, 2004) , including twocomponent models (Bordes et al., 2006; Gadat et al., 2020; Ma and Yao, 2015; Shen et al., 2018) . We start with what is perhaps the most natural nonparametric class of null distributions: the class of symmetric distributions about the origin. Unlike Roquain and Verzelen (2020) , who assume that the null distribution is symmetric around an unknown location that needs to be estimated but is otherwise known, i.e., its 'shape' is known, we assume that the shape is unknown. We do assume that the center of symmetry is known, but this is for simplicity, as an extension to an unknown center of symmetry is straightforward (see our numerical experiments in Section 2.2). Mixtures of symmetric distributions are considered in (Hunter et al., 2007) , but otherwise, we are not aware of works estimating the null distribution under an assumption of symmetry in the context of multiple testing. For works in multiple testing that assume that the null distribution is symmetric but unknown, but where the goal is either testing the global null hypothesis or controlling the false discovery rate, see (Arias-Castro and Chen, 2017; Arias-Castro and Wang, 2017) . Following the footsteps of Patra and Sen (2016) , we make sense of the problem by defining for a density f the following: where S is the class of even densities (i.e., representing a distribution that is symmetric about the origin). Note that π 0 ∈ [0, 1] is well-defined for any density f , with π 0 = 1 if and only if f itself is symmetric. Theorem 1. We have Moreover, if π 0 > 0 the supremum in (3) is attained by the following density and no other 1 : 1 As usual, densities are understood up to sets of zero Lebesgue measure. Proof. The parameter π 0 can be equivalently defined as Note that h 0 , as defined in the statement, satisfies the above conditions, implying that π 0 ≥ ∫ h 0 . Take h satisfying these same conditions, namely, (Here and elsewhere, a ∧ b is another way of denoting min(a, b).) Hence, ∫ h ≤ ∫ h 0 with equality if and only if h = h 0 a.e., in particular implying that π 0 ≤ ∫ h 0 . We have thus established that π 0 = ∫ h 0 , and also that ∫ h = π 0 if and only if h = h 0 a.e.. This not only proves (4), but also (5), essentially by definition. We have thus established that, in the setting of this section, the background component as defined above is given by and f can be expressed as a mixture of the background density and another, unspecified, density u, as follows: The procedure is summarized in Table 1 . An illustration of this decomposition is shown in Figure 1 . By construction, the density u is such that it has no symmetric background component in that, for almost every x, u(x) = 0 or u(−x) = 0. inputs: density f , given center of symmetry c 0 or candidate center points {c 1 , c 2 , . . . , c k } if center of symmetry is not provided then When all we have to work with is a sample -x 1 , x 2 , . . . , x n ∈ R -we adopt a straightforward plug-in approach: We estimate the density f , obtainingf , and apply the procedure of Table 1 , meaning, we computeĥ 0 (x) ∶= min{f (x),f (−x)}. If we want estimates for the background density and proportion, we simply returnπ 0 ∶= ∫ĥ0 andĝ 0 ∶=ĥ 0 π 0 . (By convention, we setĝ 0 to the standard normal distribution ifπ 0 = 0.) We say thatf =f n is locally uniformly consistent for f if E[ess sup x∈I f n (x) − f (x) ] → 0 as n → ∞ for any bounded interval I. (Here and elsewhere, ess sup x∈I f (x) denotes the essential supremum of f over the set I.) We note that this consistency condition is satisfied, for example, (a) Decomposition of f with center of symmetry specified as 0 (dotted). (b) Decomposition of f with center of symmetry, 0.04 (dotted), found by maximization. Figure 1 : The density f of the Gaussian mixture 0.85 N (0, 1) + 0.15 N (0, 1), in black, and its decomposition into π 0 g 0 , in orange, and (1 − π 0 )u, in blue. We specify the center of symmetry as 0 on the left, and we do not specify the center of symmetry on the right. Notice π 0 = 0.850 on the left and π 0 = 0.860 on the right. when f is continuous andf is the kernel density estimator with the Gaussian kernel and bandwidth chosen by cross-validation (Chow et al., 1983) . Theorem 2. Suppose thatf is a true density and locally uniformly consistent for f . Thenĥ 0 is locally uniformly consistent for h 0 andπ 0 is consistent, and if π 0 > 0, thenĝ 0 is locally uniformly consistent for g 0 . Proof. All the limits that follows are as the sample size diverges to infinity. We rely on the elementary fact that, for a 1 , a 2 , b 1 , b 2 ∈ R, to get that, for all x, implying thatĥ 0 is locally uniformly consistent for h 0 . To be sure, take a bounded interval I, which we assume to be symmetric without loss of generality. Then ess sup and we then use the fact that E[ess sup I f − f ] → 0. (Here and elsewhere, a ∨ b is another way of denoting max(a, b).) To conclude, it suffices to show thatπ 0 is consistent for π 0 . Fix ε > 0 arbitrarily small. There is a bounded interval I such that ∫ I f ≥ 1 − ε. Then, by the fact that 0 ≤ h 0 ≤ f a.e., (I c denotes the complement of I, meaning, I c = R ∖ I.) Similarly, by the fact that 0 ≤ĥ 0 ≤f a.e., Iĥ From this we gather that Thus consistency ofπ 0 follows if we establish that lim sup ∫ I cf ≤ ε and that ∫ Iĥ 0 − ∫ I h 0 → 0. The former comes from the fact that using the fact that f andf are densities and that ∫ I c f ≤ ε. ( I denotes the Lebesgue measure of I, meaning its length when I is an interval.) For the latter, we have having already established thatĥ 0 is locally uniformly consistent for h 0 . Confidence interval and confidence band Beyond mere pointwise consistency, suppose that we have available a confidence band for f , which can be derived under some conditions on f from a kernel density estimator -see (Chen, 2017) or (Giné and Nickl, 2021, Ch 6.4) . Theorem 3. Suppose that for some α ∈ (0, 1), we have at our disposalf l andf u such that Then, with probability at least 1 − α, π l ≤ π 0 ≤π u ,ĝ l ≤ g 0 ≤ĝ u a.e., wherê Proof. Let Ω be the event thatf l (x) ≤ f (x) ≤f u (x) for almost all x, and note that P(Ω) ≥ 1 − α by assumption. Assuming that Ω holds, we have, for almost all x, and taking the minimum in the corresponding places yieldŝ Everything else follows immediately from this. In words, we apply the procedure of Table 1 to the lower and upper bounds,f l andf u . If the center of symmetry is not provided, then we determine it based onf , and then use that same center forf l andf u . In this subsection we provide simulated examples when the background distribution is taken to be symmetric. We acquiref by kernel density estimation using the Gaussian kernel with bandwidth selected by cross-validation (Arlot and Celisse, 2010; Rudemo, 1982; Sheather and Jones, 1991; Silverman, 1986; Stone, 1984) , where the cross-validated bandwidth selection is implemented in the kedd package (Guidoum, 2020) . The consistency of this density estimator has been proven in (Chow et al., 1983) . Although there are many methods that provide confidence bands for kernel density estimator, for example (Bickel and Rosenblatt, 1973; Giné and Nickl, 2010) , for consideration of simplicity and intuitiveness, our simultaneous confidence band (in the form off l andf u ) used in the experiments is acquired from bootstrapping a debiased estimator of the density as proposed in (Cheng and Chen, 2019) . For a comprehensive review on the area of kernel density and confidence bands, we point the reader to the recent survey paper (Chen, 2017) and textbook (Giné and Nickl, 2021, Ch 6.4 ). In the experiments below, we carry out method and report the estimated proportionπ 0 , as well as its 95% confidence interval (π L 0 ,π U 0 ). We consider both the situation where the center of symmetry is given, and the situation where it is not. In the latter situation, we also report the background component's estimated center, which is selected among several candidate centers and chosen as the one giving the largest symmetric background proportion. We are not aware of other methods for estimating the quantity π 0 , but we provide a comparison with several well-known methods that estimate similar quantities. To begin with, we consider the method of Patra and Sen (2016) , which estimates the quantity θ 0 as defined in (1). We letθ PSC 0 , θ PSH 0 ,θ PSB 0 denote the constant, heuristic, and 95% upper bound estimator on θ 0 , respectively. We then consider estimators of the quantity θ, the actual proportion of the given background component f b in the mixture density where u denotes the unknown component. Note that θ 0 and π 0 may be different from θ. This mixture model may not be identifiable in general, and we discuss this issue down below. We also consider the estimator of (Efron, 2007) , denotedθ E , and implemented in package locfdr 2 . This method requires the unknown component to be located away from 0, and to be have heavier tails than the background component. In addition, when the p-values are known, Meinshausen and Rice (2006) provide a 95% upper bound on the proportion of the null component, and we include that estimator also, denotedθ MR , and implemented in package howmany 3 . Finally, when the distribution is assumed to be a Gaussian mixture, Cai and Jin (2010) provide an estimator, denotedθ CJ , when the unknown component is assumed to have larger standard deviation than the background component.θ CJ requires the specification of a parameter γ, and following the advice given by the authors, we select γ = 0.2. Importantly, unlike our method, these other methods assume knowledge of the background distribution. (Note that the methods of Efron (2007) and Cai and Jin (2010) do not necessitate full knowledge of the background distribution, but we provide them with that knowledge in all the simulated datasets.) We summarize the methods used in our experiments in Table 2 . For experiments in situations where the background component is misspecified, we invite the reader to Section 5.1. We consider four different situations as listed in Table 3 . Each situation's corresponding θ, θ 0 , and π 0 , defined as in (3), are also presented, where θ 0 and π 0 are obtained numerically based on knowledge of f . For each model, we generate a sample of size n = 1000 and compute all the estimators described above. We repeat this process 1000 times. We transform the data accordingly when applying the comparison methods. The result of our experiment are reported, in terms of the mean values as well as standard deviations, in Table 4 . It can be seen that in most situations, our estimator achieves comparable if not better performance when estimating π 0 as compared to the other methods for the parameter they are meant to estimate (θ or θ 0 ). We also note that our method is significantly influenced by the estimation off , therefore in situations wheref deviates from f often, our estimator will likely result in higher error. In addition, it is clear from the experiments that specifying the center of symmetry is unnecessary. Table 3 : Simulated situations for the estimation of a symmetric background component, together with the corresponding values of θ, θ 0 , π 0 (unspecified center), and π 00 (given center), obtained numerically (and rounded at 3 decimals). Distribution In this subsection we examine six real datasets where the null component could be reasonably assumed to be symmetric. We begin with two datasets where we have sufficient information on the background component. The first one is the Prostate dataset (Singh et al., 2002) , which contains gene expression levels for n = 6033 genes on 102 men, 50 of which are control subjects and 52 are prostate cancer patients. The main objective is to discover the genes that have a different expression level on the control and prostate patient groups. For each gene, we conduct a two-sided two sample t test on the control subjects and prostate patients, and then transform these t statistics into z values, using where Φ denotes the cdf of the standard normal distribution, and F 100 denotes the cdf of the t distribution with 100 degrees of freedom. We work with these n = 6033 z values. From (Efron, 2007) the background component here could be reasonably assumed to be N (0, 1). The results of the different proportion estimators compared in Section 2.2 are shown in the first row of Table 5 . The fitted largest symmetric component as well as confidence bands are plotted in Figure 2 (a). Next we consider the Carina dataset (Walker et al., 2007) , which contains the radial velocities of n = 1266 stars in Carina, a dwarf spheroidal galaxy, mixed with those of Milky Way stars in the field of view. As Patra and Sen (2016) stated, the background distribution of the radial velocity, bgstars, can be acquired from (Robin et al., 2003) . The various estimators are computed and shown in the second row of Table 5 . The fitted largest symmetric component as well as confidence bands are plotted in Figure 2 (b). Table 3 . For our method, the first and second rows in each situation are for when the center is unspecified, while the third and fourth rows are for when the center is specified to be the origin. Thusπ 0 on the first row of each situation is compared with π 0 ,π 0 on the third row of each situation is compared with π 00 . Otherwise, theθ X 0 are compared with θ 0 , while theθ X are compared with θ. Aside from these two real datasets where we know the background distribution, we consider four other real datasets -three microarray datasets and one police dataset -where we do not know the null distribution (Efron, 2012) . Here, out of the methods considered above, only our method and that of (Efron, 2007) and (Cai and Jin, 2010) are applicable. For the first comparison method we use the MLE estimation as presented in (Efron, 2007, Sec 4) , which is usually very close to the result of Central Matching as used in (Efron, 2012) . For the second comparison method we use the estimator in (Cai and Jin, 2010, Sec 3 .2), although we use γ = 0.1 as the recommended value γ = 0.2 leads to significant underestimation of the background proportion. Note that both of these methods are still meant to estimate θ. The HIV dataset (van't Wout et al., 2003) consists of a study of 4 HIV subjects and 4 control subjects. The measurements of n = 7680 gene expression levels were acquired using cDNA microarrays on each subject. We compute t statistics for the two sided t test and then transform them into z values using (15), with the degree of freedom being 6 here. We would like to know what proportion of these genes do not show a significant difference in expression levels between HIV and control subjects. The results are summarized in the first row of Table 6 . The fitted largest symmetric component as well as confidence bands are shown in Figure 3 (a). The Leukemia dataset comes from (Golub et al., 1999) . There are 72 patients in this study, of which 45 have ALL (Acute Lymphoblastic Leukemia) and 27 have AML (Acute Myeloid Leukemia), with AML being considered more severe. High density oligonucleotide microarrays gave expression levels on n = 7128 genes. Following (Efron, 2012, Ch 6 .1), the raw expression levels on each microarray, x i,j for gene i on array j, were transformed to a normal score where rank(x i,j ) denotes the rank of x i,j among n raw values of array j. Then t tests were then conducted on ALL and AML patients, and t statistics were transformed to z values according to (15), now with 70 degrees of freedom. As before, we would like to know the proportion of genes that do not show a significant difference in expression levels between ALL and AML patients. The results are summarized in the second row of Table 6 . The fitted largest symmetric component as well as confidence bands are shown in Figure 3 (b). The Parkinson dataset comes from (Lesnick et al., 2007) . In this dataset, substantia nigra tissue -a brain structure located in the mesencephalon that plays an important role in reward, addiction, and movement -from postmortem the brain of normal and Parkinson disease patients were used for RNA extraction and hybridization, done on Affymetrix microarrays. In this dataset, there are n = 54277 nucleotide sequences whose expression levels were measured on 16 Parkinson's disease patients and 9 control patients. We wish to find out the proportion of sequences that do not show significant difference between Parkinson and control patients. The results are summarized in the third row of Table 6 . The fitted largest symmetric component as well as confidence bands are shown in Figure 3 (c). The Police dataset is analyzed in (Ridgeway and MacDonald, 2009) . In 2006, based on 500000 pedestrian stops in New York City, each of the city's n = 2749 police officers that were regularly involved in pedestrian stops were assigned a z score on the basis of their stop data, in consideration of possible racial bias. For details on computing this z score, we refer the reader to (Efron, 2012; Ridgeway and MacDonald, 2009 ). Large positive z values are considered as possible evidence of racial bias. We would like to know the percentage of these police officers that do not exhibit a racial bias in pedestrian traffic stops. The estimated proportion are reported on the last row of Table 6 . The symmetric component as well as confidence bands are presented in Figure 3 (d). In this section, we turn our attention to extracting from a density its monotone background component following the Patra-Sen approach. For this to make sense, we only consider densities supported on R + = [0, ∞). In fact, all the densities we consider in this section will be supported on R + . For such a density f , we thus define where M is the class of monotone (necessarily non-increasing) densities on R + . Note that π 0 ∈ [0, 1] is well-defined for any density f , with π 0 = 1 if and only if f itself is monotone. Recall that the essential infimum of a measurable set A, denoted ess inf A, is defined as the supremum over t ∈ R such that A ∩ (−∞, t) has Lebesgue measure zero. Everywhere in this section, we will assume that f is càdlàg, meaning that, at any point, it is continuous from the right and admits a limit from the left. Theorem 4. Assuming f is càdlàg, we have Moreover, if π 0 > 0 the supremum in (17) is attained by the following density and no other: Note that π 0 = 0 (i.e., f has no monotone background component) if and only if ess inf{f (y) ∶ y ≤ x} = 0 for some x > 0, or equivalently, if {x ∈ [0, t] ∶ f (x) ≤ ε} has positive measure for all t > 0 and all ε > 0. If f is càdlàg, this condition reduces to Proof. Note that π 0 can be equivalently defined as Note that h 0 , as defined in the statement, satisfies the above conditions, implying that π 0 ≥ ∫ h 0 . Take h satisfying these same conditions, namely, h is monotone and 0 ≤ h(x) ≤ f (x) for almost all x, say, for x ∈ R + ∖ A where A has Lebesgue measure zero. Take such an x. Then for any y ≤ where the second inequality comes from the fact that A has zero Lebesgue measure and the definition of essential infimum. Hence, ∫ h ≤ ∫ h 0 with equality if and only if h = h 0 a.e., in particular implying that π 0 ≤ ∫ h 0 . We have thus established that π 0 = ∫ h 0 , and also that ∫ h = π 0 if and only if h = h 0 a.e.. This not only proves (18), but also (19). We have thus established that, in the setting of this section where f is assumed to be càdlàg, the background component as defined above is given by and f can be expressed as a mixture of the background density and another, unspecified, density u, as follows: The procedure is summarized in Table 7 . An illustration of this decomposition is shown in Figure 4 . (In this section, E(σ) denotes the exponential distribution with scale σ and G(κ, σ) denotes the Gamma distribution with shape κ and scale σ. Recall that E(σ) ≡ G(1, σ).) By construction, the density u is such that it has no monotone background component in that ess inf{u(y) ∶ y < x} = 0 for any x > 0. In practice, when all we have is a sample of observations, x 1 , . . . , x n ∈ R + , we first estimate the density, resulting inf , and then compute the quantities defined in (18) and (19) withf in place of f . Thus our estimates arê inputs: density f defined on [0, ∞) 1 20) , in black, and its decomposition into π 0 g 0 , in orange, and (1 − π 0 )u, in blue. Notice that the orange curve has a flat part around the middle and is slightly different from 0.85 E(1). Theorem 5. Assume f is càdlàg, and suppose thatf is a true density, càdlàg, and is locally uniformly consistent for f . Thenĥ 0 is locally uniformly consistent for h 0 andπ 0 is consistent for π 0 , and if π 0 > 0, thenĝ 0 is locally uniformly consistent fpr g 0 . Proof. From the definitions, h 0 (x) = ess inf{f (y) ∶ y < x},ĥ 0 (x) ∶= ess inf{f (y) ∶ y < x}, so that , for any a > 0, from which we get thatĥ 0 is locally uniformly consistent for h 0 wheneverf is locally uniformly consistent for f . For the remaining of the proof, we can follow in the footsteps of the proof of Theorem 2 based on the fact that, for any a > 0, [0,a] where the limit is in expectation as the sample size increases. Confidence interval and confidence band To go beyond point estimators, we suppose that we have available a confidence band for f and deduce from that a confidence interval for π 0 and a confidence band for g 0 . Theorem 6. Suppose that we have a confidence band for f as in (11). Then (12) holds with probability at least 1 − α, wherê The proof is straightforward and thus omitted. Remark 2. So far, we have assumed that the monotone density is supported on [0, ∞), but in principle we can also consider the starting point as unspecified. If this is the case, similar to what we did for the case of a symmetric component in Table 1 , we can again consider several candidate locations defining the monotone component's support, and select the one yielding the largest monotone component weight. We are here dealing with densities supported on [0, ∞), and what happens near the origin is completely crucial as is transparent from the definition ofĥ 0 . It is thus important in practice to choose an estimator for f that behaves well in the vicinity of the origin. As it is well known that kernel density estimators have a substantial bias near the origin, we opted for a different estimator. Many density estimation methods have been proposed to deal with boundary effects, including smoothing splines (Gu, 1993; Gu and Qiu, 1993) , local density estimation approaches (Fan, 1993; Hjort and Jones, 1996; Loader, 1996; Park et al., 2002) , and local polynomial approximation methods (Cattaneo et al., 2019 (Cattaneo et al., , 2020 . For consideration of simplicity and intuitiveness, we consider kernel density estimation using a reflection about the boundary point (Cline and Hart, 1991; Karunamuni and Alberts, 2005; Schuster, 1985) . We acquiref from (Schuster, 1985) and, as we did before, we acquire a 95% confidence band [f l ,f u ] from (Cheng and Chen, 2019) . We also note thatf is consistent for f as shown by Schuster (1985) . We consider two different situations as listed in Table 8 . Each situation's corresponding θ, θ 0 , and π 0 , defined as in (17), are also presented. We again generate a sample of size n = 1000 from each model, and repeat each setting 1000 times. The mean values as well as standard deviations of our method and related methods are reported in Table 9 . In the situation M1, our estimator achieves a smaller estimation error for π 0 than all other methods for their corresponding target, either θ or θ 0 , even with much fewer information on the background component. In situation M2, our method has a slightly higher error than the error of θ PSH 0 ,θ E 0 , both having complete information on the background component, but lower than that of θ PSC 0 andθ CJ 0 . Table 9 : A comparison of various methods for estimating a monotone background component in the situations of Table 8 . As always,π 0 is compared with π 0 , theθ X 0 are compared with θ 0 , while theθ X are compared with θ. In this subsection we consider a real dataset where the background component could be assumed to be monotonic nonincreasing. We look at the Coronavirus dataset (Dong et al., 2021) , acquired from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University 4 . It is well known that new coronavirus cases are consistently decreasing in the USA currently, and this trend could be seen to begin on Jan 8, 2021 as shown by the New York Times interactive data 5 . For each person infected on or after Jan 8, 2021, we count the number of days between that day and the time they were infected. We are interested in quantifying how monotonic that downward trend in coronavirus infections is. As we do not know the actual background distribution here, and Gaussian distributions are of not particular relevance, we find that none of the other comparison methods in Section 2.2 are applicable, and therefore only provide our method's estimate in Table 10 and Figure 5 . Numerically, it can be seen that the background monotonic component accounts for around 96.7% of the new cases arising on or after Jan 8, 2021. Our last emblematic setting is that of extracting a log-concave background component from a density. Log-concave densities have been widely studied in the literature (Samworth, 2018; Walther, 2009) , and include a wide variety of distributions, including all Gaussian densities, all Laplace densities, all exponential densities, and all uniform densities. They have been extensively used in mixture models (Chang and Walther, 2007; Hu et al., 2016; Pu and Arias-Castro, 2020) . Following Patra and Sen (2016) , for a density f we define where C is the class of log-concave densities. Note that π 0 ∈ [0, 1], with π 0 = 1 if and only if f itself is log-concave. Theorem 7. π 0 is the value of following optimization problem: Indeed, this problem admits a solution, although it may not be unique. Proof. From the definition in (24), it is clear that Thus we only need to show that the problem (25) admits a solution -and then show that it may not be unique to complete the proof. Here the arguments are a bit more involved. Let (h k ) be a solution sequence to the problem (25), meaning that h k ∶ R → R + is log-concave and satisfies h k ≤ f , and that q k ∶= ∫ h k converges, as k → ∞, to the value of the optimization problem (25), denoted q * henceforth. Note that 0 ≤ q k ≤ 1 since 0 ≤ h k ≤ f and ∫ f = 1, implying that 0 ≤ q * ≤ 1. We only need to consider the case where q * > 0, for when q * = 0 the constant function h ≡ 0 is a solution. Without loss of generality, we assume that q k > 0 for all k. Each h k is log-concave, and from this we know the following: its support is an interval, which we denote [a k , b k ] with −∞ ≤ a k < b k ≤ ∞; h k is continuous and strictly positive on (a k , b k ); and is non-increasing in y ≥ x. Extracting a subsequence if needed, we assume without loss of generality that a k → a and b k → b as k → ∞, for some −∞ ≤ a ≤ b ≤ ∞. Define We have that H k is a non-decreasing, and if s ≤ t, H k (t) − H k (s) ≤ F (t) − F (s), because h k ≤ f . In particular, by Helly's selection theorem (equivalent to Prokhorov's theorem when dealing with distribution functions), we can extract a subsequence that converges pointwise. Without loss of generality, we assume that (H k ) itself converges, and let H denote its limit. Note that H is constant outside of [a, b] . In fact, H(x) = 0 when x < a, because x < a k eventually, forcing H k (x) = 0. For x, x ′ > b, we have that x, x ′ > b k for large enough k, implying that H k (x) = H k (x ′ ), yielding H(x) = H(x ′ ) by taking the limit k → ∞. Note also that, for all s ≤ t, This implies that H is absolutely continuous with derivative, denoted h, satisfying h ≤ f a.e.. We claim that h is a solution to (25). We already know that 0 ≤ h ≤ f a.e.. In addition, we also have ∫ ∞ −∞ h ≥ q * . To see this, fix ε > 0, and let t be large enough that and ε > 0 being otherwise arbitrary, we deduce that ∫ ∞ −∞ h ≥ q * . It thus remains to show that h is log-concave. We establish this claim by proving that, extracting a subsequence if needed, h k converges to h a.e., and it is enough to do so in an interval. Thus let x 0 and ∆ > 0 be such that (a, b) . Note that H is strictly increasing on (a, b), because each H k is strictly increasing on (a k , b k ) due to h k being log-concave. Take any x 0 − ∆ ≤ x < y ≤ x 0 + ∆ and let δ k = (log h k (y) − log h k (x)) (y − x). We assume, for example, that δ k ≥ 0, and bound it from This being true for all such z, we have allowing us to derive δ k ≤ M 1 ∶= ∆ −1 log(2 ∆ 1 ). We can deal with the case where δ k < 0 by symmetry, obtaining that, for all k sufficiently large and all Extracting a subsequence if needed, and by symmetry, assume that the former is true for all k large enough. Then so that u k ≥ ∆ 2 > 0, eventually. Assuming so, we have u k h k (x 0 ) ≥ ∆ 1 ∆ ≥ ∆ 2 . And we also have h k (x 0 ) ≤ f (x 0 ), and together, log h k (x 0 ) ≤ M 2 ∶= log(∆ 2 ) ∨ log f (x 0 ) . With the triangle inequality, we thus have, for all x ∈ [x 0 − ∆, The family of functions (log h k ) (starting at k large enough) is thus uniformly bounded and equicontinuous on [x 0 −∆, x 0 +∆], so that by the Arzelà-Ascoli theorem, we have that (log h k ) is precompact for the uniform convergence on that interval. Therefore, the same is true for (h k ). Let h ∞ be the uniform limit of a subsequence, and note that h ∞ is continuous. h ∞ must also be a weak limit as well, since uniform convergence on a compact interval implies weak convergence on that interval. Therefore h ∞ = h a.e. on that interval, since (h k ) converges weakly to h. Hence, all the uniform limits of (h k ) must coincide with h a.e., and since any such limit must be continuous, it means that they are the same. We conclude that, on the interval under consideration, h is equal a.e. to a continuous function which is the (only) uniform limit of (h k ), and in particular, (h k ) converges pointwise a.e. to h on that interval. Thus, we have proved that the optimization problem (25) has at least one solution. We now show that there may be multiple solutions. This is the case, for instance, when f = 1 m f 1 + ⋯ + 1 m f m where each f j is a log-concave density and these densities have support sets that are pairwise disjoint. In that case, any of the components, meaning any f j , is a solution to (25), and these are the only solutions. This comes from the fact that the support of a log-concave distribution is necessarily an interval. We have thus established that a density has at least one log-concave background component, and possibly multiple ones, corresponding to the solutions to (25). If h is one such solution, then π 0 = ∫ h and we may define the corresponding density as g = h π 0 if π 0 > 0. Then f can be expressed as a mixture of the background density g and another, unspecified, density u, as follows (We do not use the notation h 0 and g 0 here, since these may not be uniquely defined.) The procedure is summarized in Table 11 . An illustration of this decomposition is shown in Figure 6 . Note that the density u may have a non-trivial log-concave background component. This is the case, for example, if f is the (nontrivial) mixture of two log-concave densities with disjoint support sets, in which case u is one of these log-concave densities. Table 11 : Log-concave background computation. (The input d is used to initialize the function v -discretized as v below -that is bounded from above by log f . In our experiments, we chose d = 0.02.) inputs: equally spaced gridpoints t = {t 1 , t 2 , . . . , t k }, density f , a boolean R indicating whether to use the Riemann integral approximation, initialization amount d do optimization (44) using SQP, and record the optimizer v and the maximum as π 0 else do optimization (33) using SQP, and record the optimizer v and the maximum as π 0 return v, π 0 Figure 6 : The density f of the Gaussian mixture 0.85 N (0, 1) + 0.15 N (3, 1), in black, and its decomposition into π 0 g, in orange, and (1−π 0 )u, in blue. The orange curve is acquired by maximizing (34), but maximizing the Riemann integral approximation (44) gives almost the same result (with the maximum difference between the two curves being 2 × 10 −14 ). Notice π 0 = 0.931 as π 0 g also takes a non-negligible weight from the smaller component 0.15 N (0, 1). In practice, based on a sample, we estimate f , resulting inf , and simply obtain estimates by plug-in as we did before, this time via At least formally, letĥ be a solution to (28). We then definê Here, to avoid technicalities, we work under the assumption that the density estimatorf =f n satisfies This condition is better suited for when the density f approaches 0 only at infinity. Also for the sake of simplicity, we only establish consistency forπ 0 . Theorem 8. When (30) holds,π 0 is a consistent. Proof. Fix η > 0 and a > 0, and consider the event, denoted Ω, that ess sup Because of (30), Ω happens with probability tending to 1 as the sample size increases. Let ε = 1 − ∫ [−a,a] f , which is small when a is large. Assume that Ω holds. Let h be a solution to (25), so that π 0 = ∫ h. Then defineh(x) = (1 − η)h(x)1{ x ≤ a} and note that h = (1 − η) [−a,a] h ≥ (1 − η)(π 0 − ε), so that ∫h is close to π 0 when η is small and a is large. We also note thath is log-concave and satisfies 0 ≤h ≤ (1 − η)f a.e.. Under Ω, we have f ≤ (1 + η)f , and so it is also the case that h ≤ (1 − η)(1 + ε)f ≤f , assuming as we do that η and ε are small enough. Then ∫h ≤π 0 , by definition ofπ 0 . Gathering everything, we obtain that (1 − η)(π 0 − ε) ≤π 0 . By letting η → 0 and a → ∞ so that ε → 0, we have established that lim infπ 0 ≥ π 0 in probability. (The lim inf needs to be understood as the sample size increases.) The reverse relation, meaning lim supπ 0 ≤ π 0 , can be derived similarly starting with a solution h to (28). Confidence interval Once again, if we have available a confidence band for f , we can deduce from that a confidence interval for π 0 . Theorem 9. Suppose that we have a confidence band for f as in (11). Then holds with probability at least 1 − α, whereπ l andπ u are the values of the optimization problem (25) with f replaced byf l andf u , respectively. The proof is straightforward and thus omitted. Unlike the previous sections, here the computation of our estimator(s) is non-trivial: indeed, after computingf with an off-the-shelf procedure, we need to solve the optimization problem (28). Thus, in this section, we discuss how to solve this optimization problem. Although least concave majorants (or equivalently greatest convex minorants) have been considered extensively in the literature, for example in (Franců et al., 2017; Jongbloed, 1998) , the problem (25) calls for a type of greatest concave minorant, and we were not able to find references in the literature that directly tackle this problem. We do want to mention (Gorokhovik, 2019) , where a similar concept is discussed, but the definition is different from ours and no numerical procedure to solve the problem is provided. For lack of structure to exploit, we propose a direct discretization followed by an application of sequential quadratic programming (SQP), first proposed by Wilson (1963) . For more details on SQP, we point the reader to (Gill and Wong, 2012) or (Nocedal and Wright, 2006, Ch 18) . Going back to (25), where here f plays the role of a generic density on the real line, the main idea is to restrict v ∶= log h to be a continuous, concave, piecewise linear function. Once discretized, the integral admits a simple closed-form expression and the concavity constraint is transformed into a set of linear inequality constraints. To setup the discretization, for k ≥ 1 integer, let t −k,k < t −k+1,k < ⋯ < t k−1,k < t k,k be such that t j,k = −t −j,k (symmetry) and t j+1,k −t j,k = δ k (equispaced) for all j, with δ k → 0 (dense) and t k,k → ∞ as k → ∞ (spanning the real line). Suppose that v is concave with v ≤ log f and to that function associate the triangular sequence v j,k ∶= v(t j,k ). Then, for each k, by the fact that v is concave. In addition, v j,k = v(t j,k ) ≤ log f (t j,k ) =∶ u j,k (which are given). Instead of working directly with a generic concave function v, we work with those that are piecewise linear as they are uniquely determined by their values at the grid points if we further restrict them to be = −∞ on (−∞, t −k,k ) ∪ (t k,k , +∞). Effectively, at k, we replace in (25) the class C with the class C k of functions h such that v = log h is log-concave, linear on each interval [t j,k , t j+1,k ], v(x) = −∞ for x < t −k,k or x > t k,k , and that satisfies v(t j,k ) ≤ log f (t j,k ), for all j. This leads us to the following optimization problem, which instead of being over a function space is over a Euclidean space: where and where As we are not aware of any method that could solve (33) exactly, we will use sequential quadratic programming (SQP). In our implementation we use the R package nloptr 6 based on an original implementation of Kraft (1988) . Theorem 10. Assume that f is Lipschitz and locally bounded away from 0. Then the value of discretized optimization problem (33) converges, as k → ∞, to the value of the original optimization problem (25). The assumptions made on f are really for convenience -to expedite the proof of the result while still including interesting situations -and we do expect that the result holds more broadly. We also note that, in our workflow, the problem (33) is solved forf , and thatf satisfies these conditions (remember that the sample size is held fixed here) if it is a kernel density estimator based on a smooth kernel function supported on the entire real line like the Gaussian kernel. Proof. Let h be a solution to (25) so that ∫ h = π 0 , where π 0 denotes the value of (25). Define v = log h and let v j,k = v(t j,k ). As we explained above, this makes v k ∶= (v −k,k , . . . , v k,k ) feasible for (33). Therefore Λ(v k ) ≤ π 0,k , where π 0,k denotes the value of (33). On the other hand, letṽ k denote the piecewise linear approximation to v on the grid, meaning thatṽ k (x) = −∞ if x < t −k,k or x > t k,k ,ṽ k (t j,k ) = v(t j,k ) andṽ k linear on [t j,k , t j+1,k ] for all j. We then have where the convergence is justified, for example, by the fact thath k → h pointwise and 0 ≤h k ≤ h (because, by concavity of v, v ≥ṽ k ), so that the dominated convergence theorem applies. From this we get that lim inf k→∞ π 0,k ≥ π 0 . In the other direction, let v k be a solution of (33), so that Λ(v k ) = π 0,k . Letṽ k denote the linear interpolation of v k on the grid (t −k,k , . . . , t k,k ), defined exactly as done above, and leth k = exp(ṽ k ). We have thath k ≤ f at the grid points, but not necessarily elsewhere. To work in that direction, fix a > 0, taken arbitrarily large in what follows. Because of our assumptions on f , we have that u ∶= log f is Lipschitz on [−a, a], say with Lipschitz constant L. In particular, withũ k denoting the linear approximation of u based on the grid, we have Note thatv k is also concave and piecewise linear, and becauseh k ≤ f ,ṽ k ≤ũ k , we havē In particular,h k ∶= exp(v k ) is feasible for (25), implying that ∫hk ≤ π 0 . On the other hand, we have and so, becauseh k ≤f k ∶= exp(ũ k ), Given ε > 0 arbitrarily small, choose a large enough that for k large enough we have ∫ [−a,a] cfk ≤ 2ε, implying that π 0,k ≤ exp(η k )π 0 + 2ε. Using the fact that exp(η k ) → 1 since η k = Lδ k → 0, and that ε > 0 is arbitrary, we get that lim sup concluding the proof. We mention that besides the discretization (33), we also considered a more straightforward discretization of the integral, effectively replacing Λ in (33) with Λ 0 defined as The outputs returned by these two discretizations, (33) and (44), were very similar in our numerical experiments. In this subsection we consider experiments where the background component could be assumed to be log-concave. The density fitting process and confidence band acquisition are exactly the same as in Section 2.2. We first consider four different situations as listed in Table 12 . Although the mixture distributions are identical to those in Table 3 for the symmetric case, we need to point out that π 0 is different here, as π 0 here corresponds to the largest possible log-concave component as defined in (24). We again generate a sample of size n = 1000 from each model, and repeat each setting 1000 times. We note that the output of our algorithm depends heavily on the estimation of the densitŷ f . When the bandwidth selected by cross-validation yields a density with a high frequency of oscillation, the largest log-concave component is very likely to be smaller than the correct value. For an illustrative situation, see Figure 7 . In the event that such a situation happen, we recommend that the user look at a plot off before applying our procedure, with the possibility of selecting a larger bandwidth. This is what we did, for example, for the Carina and Leukemia datasets in Figure 10 . In the simulations, to avoid the effect of these issue, we report the median value instead of the mean. These values are reported in Table 13 . As can be seen, even with only the assumption of log-concavity, our method is accurate in estimating π 0 , with estimation errors ranging from 0.001 to 0.007. Our method frequently achieves smaller error in estimating π 0 than comparison methods in estimating θ 0 and θ, and does so with less information on the background component. For situations where the background component is specified incorrectly, we invite the reader to Section 5.1. Figure 7 : An illustrative situation that occurred in the course of our simulations, where a high frequency of oscillation off results in a significantly lower estimation for π 0 : a Gaussian mixture density 0.85 N (0, 1)+0.15 N (0, 1), in black, and corresponding largest log-concave component h, in red, with the fitted densityf , in orange, and fitted largest log-concave componentĥ, in blue. Here, as before,f is obtained by kernel density estimation with bandwidth chosen by cross-validation. The result based on this estimate isπ 0 = 0.807, while the actual value is π 0 = 0.931. Remark 3. We note that here the SQP algorithm as implemented in the R package nloptr sometimes givesĥ = 0 as the largest log-concave component due to some parts of the estimated densityf being 0. This situation happens occasionally when computing the lower confidence bound, and it is of course incorrect in situations like those in Table 12 . When this situation occurs, we rerun the algorithm on the largest interval of non-zerof values and report the greatest log-concave component acquired on that interval. Table 12 . Here we report the median values instead of the mean values (and also report the standard deviations, as before). As always,π 0 should be compared with π 0 ,θ X 0 should be compared with θ 0 , andθ X should be compared with θ. In this subsection we examine real datasets of two component mixtures where the background component could be assumed to be log-concave. We first consider the six real datasets presented in Section 2.3, but this time look for a background log-concave component. When the background is known, the numerical results of the Prostate and Carina datasets can be found in Table 14 , Figure 8 (a) and Figure 8(b) . When the background is unknown, the numerical results of the HIV, Leukemia, Parkinson and Police datasets can be found in Table 15 , Figure 9 (a), Figure 9 (b), Figure 9 (c) and Figure 9 (d). In addition to the above six datasets, we also include here the Old Faithful Geyser dataset (Azzalini and Bowman, 1990) . This dataset consists of 272 waiting times, in minutes, between eruptions for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. We attempt to find the largest log-concave component of the waiting times. The results are summarized in the first row of Table 15, Figure 11 (a) and Figure 11 (b). We want to note from this example that the curveĥ u acquired fromf u leading to the computation ofπ U 0 , is not the upper confidence bound of h 0 , as shown by Figure 11(a) . Remark 4. We observe here through these real datasets that compared to symmetric background assumptions in Section 2.3, the largest log-concave background component usually has a higher weight than the largest symmetric background component. In this paper, we extend the approach of Patra and Sen (2016) to settings where the background component of interest is assumed to belong to three emblematic examples: symmetric, monotone, and log-concave. In each setting, we derive estimators for both the proportion and density for the background component, establish their consistency, and provide confidence intervals/bands. However, the important situation of incorrect background distribution and other extensions remain unaddressed, and therefore we discuss them in this section. As mentioned in Section 2.2 and Section 4.3, our method requires much less information than comparable methods, and therefore is much less prone to misspecification of the background component. In this subsection, we give an experiment illustrating this point. We consider the mixture model: 0.85 T 6 + 0.15 N (3, 1). Instead of the correct null distribution T 6 , we take it to be N (0, 1). This situation could happen in situations of multiple testing settings, for example in the HIV dataset in Section 2.3. We consider both a symmetric background and log-concave background on this model in Table 16 , and report the fitted values of our method and comparison methods in Table 17 . As can be seen, when estimating π 0 , our estimator achieves 0.002 error when assuming symmetric background and 0.004 error when assuming log-concave background, less then any other method Figure 8 : Estimated background log-concave component on the Prostate (z values) and Carina (radial velocity) datasets: the black curve represents fitted density; the blue curve represents computed h, one of the largest log-concave components. Due to unusually high frequency of oscillation inf in the Carina dataset, we consider increasing the bandwidth from the one acquired by cross-validation, and the corresponding result is shown in Figure 10 (a). in comparison that estimates θ 0 or θ. The heuristic estimator of Patra and Sen (2016) has slightly higher error than our method, while the constant estimator of Patra and Sen (2016) and the estimators of Efron (2007) and Cai and Jin (2010) have large errors. The upper confidence bound of Meinshausen and Rice (2006) also becomes incorrect. Although not discussed in the main part of the paper, some combinations of the shape constraints considered earlier are possible. For example, one could consider extracting a maximal background that is symmetric and log-concave; or one could consider extracting a maximal background that is monotone and log-concave. As it turns out, these two combinations are intimately related. Mixtures of symmetric log-concave distributions are considered, for example, in (Pu and Arias-Castro, 2020). All our examples were on the real line, corresponding to real-valued observations, simply because the work was in large part motivated by multiple testing in which the sample stands for the test statistics. But the approach is more general. Indeed, consider a measurable space, and let D be a , and Police (z scores) datasets: the black curve represents fitted density; the blue curve represents computedĥ, one of the largest log-concave components. Due to the high frequency of oscillation inf in the Leukemia dataset, we consider increasing the bandwidth from the one acquired by cross-validation, and the corresponding result is shown in Figure 10 (b). Figure 10 : Carina (radial velocity) and Leukemia (z values) datasets with kernel density, with bandwidth chosen 'by hand' instead of by cross-validation: the black curve represents fitted density with increased bandwidth; the blue curve represents the computedĥ. We note that for the Carina dataset, the bandwidth was increased from 3.085 to 6, andπ 0 changed from 0.550 to 0.600. For the Leukemia dataset, the bandwidth was been increased from 0.124 to 0.25, andπ 0 changed from 0.939 to 0.981. (a) Geyser dataset withĥ l andĥu (b) Geyser dataset with onlyĥ Figure 11 : Estimated background log-concave component on the Geyser (duration) dataset: the black curve represents the fitted density; the blue curve represents the computedĥ; the gray curves (left) representf l andf u ; the light blue curves (left) representsĥ l andĥ u . Note from the left plot thatĥ u is only used to computeπ U 0 , and is not an upper bound for h. class of probability distributions on that space. Given a probability distribution µ, we can quantify how much there is of D in µ by defining π 0 ∶= sup π ∶ ∃ν ∈ D s.t. µ ≥ πν . For concreteness, we give a simple example in an arbitrary dimension d by generalizing the setting of a symmetric background component covered in Section 2. Although various generalizations are possible, we consider the class -also denoted S as in (3) -of spherically symmetric (i.e., radial) densities with respect to the Lebesgue measure on R d . It is easy to see that the background component proportion is given by and, if π 0 > 0, the background component density is given by g 0 ∶= h 0 π 0 . Distribution-free multiple testing Distribution-free tests for sparse heterogeneous mixtures A survey of cross-validation procedures for model selection A look at some data on the Old Faithful Geyser Controlling the false discovery rate: a practical and powerful approach to multiple testing On the adaptive control of the false discovery rate in multiple testing with independent statistics On some global measures of the deviations of density function estimates Semiparametric estimation of a twocomponent mixture model Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing lpdensity: Local polynomial density estimation and inference Simple local polynomial density estimators Clustering with mixtures of log-concave distributions A tutorial on kernel density estimation and recent advances Nonparametric inference via bootstrapping the debiased estimator Consistent cross-validated density estimation Kernel estimation of densities with discontinuities or discontinuous derivatives Estimation in mixtures of two normal distributions An interactive web-based dashboard to track COVID-19 in real time Size, power and false discovery rates Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction Empirical Bayes analysis of a microarray experiment Local linear regression smoothers and their minimax efficiencies A new algorithm for approximating the least concave majorant Parameter recovery in twocomponent contamination mixtures: The L 2 strategy Operating characteristics and extensions of the false discovery rate procedure A stochastic process approach to false discovery control Sequential quadratic programming methods Confidence bands in density estimation Mathematical Foundations of Infinite-Dimensional Statistical Models Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Minimal convex majorants of functions and Demyanov-Rubinov exhaustive super(sub)differentials Smoothing spline density estimation: a dimensionless automatic algorithm Smoothing spline density estimation: theory. The Annals of Statistics Kernel estimator and bandwidth selection for density and its derivatives: The kedd package Robust Nonparametric Statistical Methods Locally parametric nonparametric density estimation Maximum likelihood estimation of the mixture of log-concave densities Robust estimation of a location parameter Robust Statistics Inference for mixtures of symmetric distributions Proportion of non-zero normal means: universal oracle equivalences and uniformly consistent estimators Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons The iterative convex minorant algorithm for nonparametric estimation A generalized reflection method of boundary correction in kernel density estimation A software package for sequential quadratic programming Estimating the proportion of true null hypotheses, with application to DNA microarray data A genomic pathway approach to a complex disease: Axon guidance and Parkinson disease Mixture models: theory, geometry and applications Multivariate normal mixtures: a fast consistent method of moments Local likelihood density estimation Flexible estimation of a semiparametric two-component mixture model with one parametric component Mixture models: Inference and Applications to Clustering Finite mixture models Finite Mixture Models Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses Numerical Optimization On local likelihood density estimation Estimation of a two-component mixture model with applications to multiple testing An EM algorithm for fitting a mixture model with symmetric log-concave densities Doubly robust internal benchmarking and false discovery rates for detecting racial bias in police stops A synthetic view on structure and evolution of the Milky Way False discovery rate control with unknown null distribution: is it possible to mimic the oracle? Empirical choice of histograms and kernel density estimators Recent progress in log-concave density estimation Incorporating support constraints into nonparametric estimators of densities A reliable data-based bandwidth selection method for kernel density estimation An MM algorithm for estimation of a two component semiparametric density mixture with a known component Density Estimation for Statistics and Data Analysis Gene expression correlates of clinical prostate cancer behavior An asymptotically optimal window selection rule for kernel density estimates A direct approach to false discovery rates A survey of sampling from contaminated distributions Cellular gene expression upon human immunodeficiency virus Type 1 infection of CD4 + -T-cell lines Velocity dispersion profiles of seven dwarf spheroidal galaxies Inference and modeling with log-concave distributions A Simplicial Algorithm for Concave Programming We are grateful to Philip Gill for discussions regarding the discretization of the optimization problem (25).