key: cord-0478825-52z4fipb authors: Fern'andez-Val, Iv'an; Peracchi, Franco; Vuuren, Aico van; Vella, Francis title: Selection and the Distribution of Female Hourly Wages in the U.S date: 2018-12-21 journal: nan DOI: nan sha: ba5d09a34878f90ee3488959f11053d9ea4b1bed doc_id: 478825 cord_uid: 52z4fipb We analyze the role of selection bias in generating the changes in the observed distribution of female hourly wages in the United States using CPS data for the years 1975 to 2020. We account for the selection bias from the employment decision by modeling the distribution of the number of working hours and estimating a nonseparable model of wages. We decompose changes in the wage distribution into composition, structural and selection effects. Composition effects have increased wages at all quantiles while the impact of the structural effects varies by time period and quantile. Changes in the role of selection only appear at the lower quantiles of the wage distribution. The evidence suggests that there is positive selection in the 1970s which diminishes until the later 1990s. This reduces wages at lower quantiles and increases wage inequality. Post 2000 there appears to be an increase in positive sorting which reduces the selection effects on wage inequality. The dramatic increase in female wage inequality in the United States since the early 1980s (see, for example, Katz and Murphy 1992 , Katz and Autor 1999 , Lee 1999 , Autor et al. 2008 , Acemoglu and Autor 2011 , Autor et al. 2016 Topel 2016) has been accompanied by substantial changes in both female employment rates and the distribution of their annual hours of work. Given the prominence that accounting for the selection bias (see Heckman 1974 Heckman , 1979 from employment decisions has played in empirical studies of the determinants of female wages it seems natural to investigate its role in the evolution of wage inequality. This paper examines the sources of changes in the distribution of female hourly real wage rates in the United States from 1975 to 2020 while accounting for movements, and individuals' locations, in the annual hours of work distribution. The inequality literature allocates wage changes to two sources. The first is the "structural effect" which captures the market value of an individual's characteristics. This includes skill premia, such as the returns to education (see, for example, Welch 2000 and Murphy and Topel 2016), cognitive and noncognitive skills (Heckman, Stixrud and Urzua 2006) , declining minimum wages in real terms (see, for example, DiNardo et al. 1996 and Lee 1999) , and the increasing use of non-compete clauses in employment contracts (Krueger and Posner 2018) . The second source, referred to as the "composition effect", reflects differences across workers' observed characteristics. These include increases in educational attainment. Earlier papers (see, for example, Angrist et al. 2006, and Chernozhukov et al. 2013 ) have estimated these two effects under general conditions. However, as they focus on male wages they have ignored the role of selection. Understanding the role of selection in the female wage inequality context is important. First, as the impact of selection is frequently interpreted as reflecting sorting patterns it is valuable from a policy perspective to understand how worker productivity has changed as an increasing proportion of women have entered the labor market. Second, the importance of accounting for selection bias in estimating the determinants of female wages suggests that an evaluation of the role of structural and composition effects requires an appropriate treatment of selection. Third, assessing the impact of selection on wages and inequality is particularly relevant when the composition of the working and nonworking populations have evolved as drastically as has occurred in our sample period. Finally, understanding the impact of selection bias may provide policy makers with guidance as to which measures may be taken to reduce wage inequality. States over a period of increasing female wage inequality. 1 Mulligan and Rubinstein (2008) , hereafter MR, correct for selection in the female mean wage in the United States for the years 1975-1999 and argue that the sharp increase of female wages partially reflected that the selected population of working females became increasingly more productive in terms of unobservables. They also find the pattern of sorting turned from negative to positive in the early 1990s. Evaluating the contribution of selection bias on the mean wage is straightforward in additive models as the selection component, under some assumptions, can be separated. The pattern of sorting is inferred from the coefficient for the selection correction. The nonseparable model required for estimating wage distributions has greater difficulties in isolating the selection component. Maasoumi and Wang (2019) , hereafter MW, employ the copula based estimator for quantile selection models of Arellano and Bonhomme (2017) , hereafter AB. AB and MW define the selection effect as the difference between the observed wage distribution and the counterfactual wage distribution simulated via their models' estimates assuming 100% participation. MW provide a similar conclusion regarding the pattern of sorting as MR for the overlapping years in their studies. Blau et al. (2021) follow Olivetti and Petrongolo (2008) who use a "selection on unobservables" approach to impute wages for nonworkers based on propensity scores for employment. They also compute the predicted wage distribution assuming 100% participation. In contrast to MR and MW, they find a more modest role for selection and that sorting did not change sign over the sample period. We define the selection effect as the difference in the observed wage distribution and the wage distribution that would result under the participation process associated with the year with the lowest participation rate. We find that the direction of sorting did not change during the years considered by MR. We address several methodological and empirical issues regarding selection in the female wage inequality context. Our methodological contributions are the following. First, we extend the Fernández-Val, Van Vuuren, and Vella (2021) , hereafter FVV, estimator for nonseparable models with censored selection rules. The FVV estimator incorporates the number of working hours rather than the binary work decision as the selection variable and here we allow for different censoring points conditional on the individual's characteristics. This variation across censoring points captures differences in "fixed working costs" (see, for example, Cogan 1981) . Second, we provide a procedure for decomposing changes in the wage distribution into structural, composition and selection effects in a nonseparable model which allows for selection from the choice of annual working hours. We contrast our decomposition approach with the corresponding exercise based on the Heckman (1979) selection model (HSM) . Third, we extend our estimator to allow selection into annual hours to reflect two separate selection mechanisms. Namely, the choices of annual weeks and weekly hours. Fourth, we provide an estimator motivated by the ordered treatment model of Heckman and Vytlacil (2007) which allows for bunching in annual hours or annual weeks and apply it via our decomposition method. Our following empirical contributions feature results based on the two most commonly employed Current Population Survey (CPS) data sets. First, unlike MR and MW who analyze wages for full-time full-year (FTFY) workers, we obtain a fuller picture of the evolution of the wage distribution by including all workers and accounting for selection from the hours of work decision. Second, we confirm previous findings, restricted to FTFY workers, regarding movements in the wage distribution. Female wage growth at lower quantiles is modest although the median wage has grown steadily. Gains at the upper quantiles are large and have produced an increase in female wage inequality. Finally, we provide new evidence regarding the role of selection. Changes in selection are especially important at the lower end of the wage distribution and have generally decreased wage growth and increased wage inequality. Although we are able to reproduce the estimated sorting pattern as MR and MW, we illustrate this reflects the employed identification assumptions. We show that exploiting the variation in hours worked as a form of identification produces results consistent with positive sorting for the whole sample period. An important empirical result relates to the pattern of sorting and its implication for the impact of selection on wages and inequality. We find clear evidence of positive sorting in the mid 1970s. The period 1975 to 2000 experiences a shift in the distribution of female annual hours of work, accompanied by a reduction in the level of positive sorting. These two forces decrease wages at lower quantiles and increase wage inequality. For the remainder of our sample period there appears to be a return to higher levels of positive sorting and a decrease in the impact of selection on wage inequality. The rest of the paper is organized as follows. The next section discusses the data. Section 3 describes our empirical model and defines our decomposition exercise. It also provides alternative estimators employing ordered or multiple censored selection rules. This section concludes with a comparison of our decomposition approach with that associated with the HSM. Section 4 presents the empirical results. Section 5 reconciles the difference between our results with those of MR, while Section 6 investigates the impact of the changes in selection in wage inequality. Section 7 offers some concluding comments. We employ the two most commonly analyzed micro-level data sets, the Annual Social and Economic Supplement (ASEC) and the Merged Outgoing Rotation Groups (MORG), from the CPS. Appendix A of Lemieux (2006) provides a comparison of the two data sets. We employ both to contrast results and to allow comparisons with earlier studies. We employ the ASEC for the 46 survey years from 1976 to 2021 reporting annual earnings for the previous calendar year. 2 Unless otherwise stated, we refer to the year for which the data are collected and not that of the survey. The 1976 survey is the first for which information on weeks worked and usual hours of work per week last year are available. To avoid issues related to retirement and ongoing educational investment we restrict attention to those aged 24-65 years in the survey year. This produces an overall sample of 2,219,820 females. The annual sample sizes range from 33,924 in 1976 to 59,622 in 2001. Annual hours worked are defined as the product of weeks worked and usual weekly hours of work last year. Those reporting zero hours usually respond that they are not in the labor force in the week of the March survey. We define hourly wages as the ratio of reported annual labor earnings in the year before the survey, converted to constant 2019 prices using the consumer price index for all urban consumers, and annual hours worked. Hourly wages are unavailable for those not in the labor force. For the self-employed, unpaid family workers and the Armed Forces annual earnings or annual hours tend to be poorly measured and we exclude these groups from our sample. This results in a deletion of 5.4, 0.4 and 0.07 percent of the observations for the self employed, unpaid family workers and the Armed Forces, respectively. The figures for self employed and the armed forces have trended upwards while those for family workers have trended downwards over the sample period. These groups do not show any cyclical variation. The only exception is the number of self employed during the Great Recession which, compared to the total employed, dropped considerably. We use observations with imputed wages for their values of working hours but do not use them in the wage sample. The restriction to civilian dependent employees with positive hourly wages and people out of the labor force last year results in a sample of 2,055,063 females. The subsample of civilian dependent employees with positive hourly wages comprises 1,190,928 observations. A benefit of the ASEC is its extensive family background variables. We use the years 1979 to 2019 for the MORG using the CEPR extracts. The MORG contains information on hourly wages in the survey week for those paid by the hour and on weekly earnings from the primary job during the survey week for those not paid by the hour. Lemieux (2006) and Autor et al. (2008) 1980 (121,786) and the lowest in 2019 (91,647). The subsample of civilian dependent employees working in the reference week is 2,219,820 observations. This low figure, relative to the ASEC, is expected as employees who did not work in the reference week may have worked in another week. Family background variables are only available since 1984. This restricts the family background characteristics to family size. Figure 1 confirms two observations made by Lemieux (2006 below, may be the relatively lower employment rate in the MORG. This implies that the MORG D1 is higher in the population distribution than the ASEC D1. The difference between the data sets decreases for the MORG in 1979-1981 with a corresponding smaller decrease for the ASEC. The ASEC and the MORG then show similar growth with the ASEC wage consistently below the MORG. As noted by Lemieux (2006) , the period 1979 to 1984 displays a sharp increase in the residual variance in the MORG not found for the ASEC. is somewhat more complicated. Overall, the MORG has the larger increase but the difference reflects wage movements for the period 1979 to 1984. The increase in the interdecile ratio post 1984 is relatively lower for the MORG. This is consistent with Lemieux (2006) who notes that the ASEC not only has higher wage dispersion but also increases faster over time. categories might also reflect selection effects. The failure to include those who do not work FTFY means that the selection effects in earlier studies may reflect movement from the non-FTFY to FTFY, rather than from non-employment to FTFY. We consider a version of the HSM where the censoring rule for the selection process incorporates the information on annual hours worked rather than the binary employment/non-employment decision. The model has the form: where Y is the logarithm of hourly wages, H is annual hours worked, D is a selection is not censored at µ(Z), X and Z are vectors of observable conditioning variables, g, h and µ are unknown functions, and E and V are respectively a vector and a scalar of potentially dependent unobservable variables with cumulative distribution functions (CDFs) F E and F V . We assume that X is a, not necessarily strict, subset of Z, i.e. X ⊆ Z. We refer to equation (3) as the selection rule. It corresponds to censored selection with an unobserved censoring point, that is we observe the censoring status, D, but not the censoring point, µ(Z). Equations (2)-(3) can be considered a reduced-form representation for hours worked. The model is a nonparametric and nonseparable version of the Tobit type-3 model considered by FVV, extended to incorporate an unknown censoring threshold which is a function of Z. This threshold is motivated by fixed labor costs measured in terms of hours. Individuals only work if the desired number of hours exceeds a minimum number given by µ(Z). Cogan (1981) shows that fixed labor costs reduce the number of individuals working very few hours. We allow the fixed labor costs to vary by individual and household characteristics. Let ⊥ ⊥ denote stochastic independence. We assume: is strictly increasing a.s. Without loss of generality, we can normalize V to be standard uniformly distributed. The potential dependence between E and V implies Z ′ s independence of E in the entire population does not exclude dependence in the selected population . Lemma 1 extends FVV's result to our model. Assumption 1: The proof of the first statement follows from the same argument as in Lemma 1 of The assumption that Z is independent of (E, V ) then proves the result implying that V is an appropriate control function. 3 The result V = F H * | Z (H | Z) follows directly from the assumption that h is strictly increasing in its second argument and the normalization on the distribution of V . Identification of F H * | Z follows from Buchinsky and Hahn (1998) . In the Appendix, we propose an estimator of F H * | Z based on distribution regression. This estimator is an alternative to the estimators of Buchinsky and Hahn (1998) and Chernozhukov and Hong (2002) , which are based on quantile regression. The decompositions presented below require a wage distribution which incorporates the value of V and a statement regarding the region in which it is identified. To proceed, we denote the support of random variables and vectors by calligraphic letters while lower case letters in parentheses indicate that the support is conditional on a stochastic vector taking a particular value; e.g. Z(x) is the support of Z | X = x. We define the Local Average Structural Function (LASF) and Local Distribution Structural Function (LDSF) as: They represent the mean and distribution of Y if all individuals with control function equal to v had observable characteristics equal to x. An argument similar to FVV shows that: This set, referred to as the identification set by FVV, is identical to the support of (X, V ) among the selected population. Lemma 1 implies that the LASF and LDSF equal the mean and distribution of the observed Y conditional on (X, V ) and that it is identified. This follows directly from (E, V ) ⊥ ⊥ Z and that (x, v) ∈ X V * implies the ability to find a (z, v) combination for which h(z, v) > µ(z). We refer to FVV for a discussion on how the size of the identified set depends on the availability of exclusion restrictions on Z with respect to X. There are different candidates for H in (2). As the ASEC provides both usual hours worked per week and annual hours, calculated as the product of weeks worked last year and the usual number of hours worked per week, we employ several alternatives. Although the usual hours per week may be the variable in the ASEC that is closest to the hours decision in labor supply models (Killingsworth, 1983) , it may also reflect whether the job has pre-set hours. Therefore, we employ the annual measure which incorporates the weeks decision. As the extensive margin may capture whether an individual has worked a positive number of hours in the past year, we also investigate the use of the number of worked weeks. A theoretical motivation for this measure follows from search models in which the offered wages depend positively on the job offer arrival rate and negatively on the separation rate (Burdett and Mortensen, 1998) . As these rates also determine the number of weeks worked, it implies a relationship between weeks worked and wages. The appropriate censoring variable in the MORG is the number of hours worked in the reference week. Note that this variable solves some of the problems mentioned above. We consider counterfactual CDFs constructed by integrating the LDSF with respect to different joint distributions of the conditioning variables and control function. where: denotes the joint CDF of (Z, V ) in the selected population and F Z,V denotes the joint CDF of Z and V in the entire population. The counterfactual CDFs are constructed by combining the CDFs G and F Z,V with the selection rule (3) for different groups, each group corresponding to a different time period or a subpopulation defined by certain characteristics. Specifically, let G t be the LDSF in group t, F Z k ,V k be the joint CDF of Z and V in group k, and let 1{h r (z, v) > µ r (z)} be the selection rule in group r. The counterfactual CDF of Y when G is as in group t, F Z,V is as in group k, and the selection rule is as in group r is defined as: provided that the integrals are well-defined. Since the mapping v → h(z, v) is strictly monotonic, the condition h r (z, v) > µ r (z) in (5) is equivalent to the condition: where is the probability of working less hours than the censoring point conditional on Z = z in group r and π r (z) is the propensity of working in that group. Given G s Y t,k,r (y), the corresponding counterfactual quantile function (QF) is: Under these definitions the observed CDF and QF of Y for the selected population in group t are G s Y t,t,t and q s Y t,t,t respectively. Nonparametric identification of (5) and (7) depends on whether the integrals in (5) are well defined. They are when two conditions are met. First, if Z k ⊆ Z r , then π r is identified over all z combinations in the integral. Second, when (X V k ∩X V * r ) ⊆ X V t , then the LDSF is identified for all combinations of z on which we integrate. Here, X V * r denotes the support of (X, V ) for the selected population in group r. The identification conditions simplify when we consider two years for q, r, and t, such as 0 and 1, which is relevant for the decompositions. For example, we need is that the employment rates in year 0, conditional on X, are lower than those in year 1. Using (7), we decompose the difference in the observed QF of Y for the selected population between any two groups, say group 1 and group 0, as: 6 where [1] is a selection effect that captures changes in the selection rule given the joint distribution of Z and V , [2] is a composition effect that reflects changes in the joint distribution of Z and V , and [3] is a structural effect that reflects changes in the conditional distribution of Y given Z and V . These effects are relative to the base year. We stress that this definition of the selection effect differs from the standard definition. This is discussed in Section 3.5. The model can be extended to a multiple censored selection mechanism operating through both weeks and hours. The model has the form: The analysis of this model is similar to that above. However, it is necessary to employ both control functions to, for example, calculate the LDSF, i.e. The identification conditions change to accommodate that the support condition is defined over two control functions. Using the same notation as Section 3.2, the support requirements for the counterfactuals are ZV H V W,k ⊆ ZV H V W,r and We acknowledge that there are circumstances under which this model will collapse to the single censoring mechanism case. However, as these are somewhat obvious we do not detail them here. The models above employ control functions which assume that the selection variable is continuous. However, both the numbers of weeks and hours worked feature bunching at specific values (e.g. 40 hours and 52 weeks). The following model with ordered selection incorporates bunching: where the variables have similar interpretations as above. The main difference between this model and those above is that it allows a discrete distribution of H at the expense of requiring separability in the selection process. We assume Z ⊥ ⊥ (E, V ) and V follows a standard uniform distribution. This model is related to the ordered choice model of Heckman and Vytlacil (2007, p. 4980) , but unlike their model g(X, E) does not depend on H. It can also be interpreted as an extension of Newey (2007) to multiple ordered outcomes. We define the identification set as: This set collects (x, p) combinations in the selected sample (i.e. H = h > 0) for which there is a (h, z) combination in HZ(x) such that µ h ′ (Z) = p for the propensity score of a value of H smaller or equal than the observed value h. For example, if H = 3, this restriction is satisfied when µ h ′ (z) = p, for some h ′ ∈ {0, 1, 2, 3}. We define the LDSF as in (4). We prove the following lemma in the Appendix. Lemma 2 implies (x, p) ∈ X P K is also a sufficient condition for identification as (see also Heckman and Vytlacil, 2007) : We need additional assumptions to obtain counterfactual distributions. In the models with continuous censoring we hold the value of the control function constant and change the lowest value at which the individual is participating (see (6)). We cannot follow the same strategy here as V is not point identified. However, from the values of H and Z we know that the value of V is between µ H−1 (Z) and µ H (Z). This implies that individuals with H = 1 have the lowest values of V . Therefore, if we increase µ 0 (Z), while leaving V unchanged, some individuals with H = 1 would no longer participate although we do not know who. Hence, we integrate over the distribution of V and change the range of integration accordingly. We show in the appendix that: where µ K (z) := 1 for any z. This equation is comparable to equation (7.2) of Heckman and Vytlacil, 2007) . Based on (12), the counterfactual distribution when G is as in group t, F Z is as in group k, and the selection rule is as in group r is: The decompositions are identical to (8). The identification restrictions are related to the integrals in (13). The integral in the numerator of the second line of (13) can be written as For both of these terms to be identified for any h, we need that X P k K ⊆ X P t K . For the identification of the integral in the numerator of the first line of (13), a similar argument gives (X P r K ∩ X P k K ) ⊆ X P t K . We also need that Z k ⊆ Z r otherwise µ r 0 (z) is not identified. The identification restrictions imply that, for example, to identify G s Y 1,1,0 , one needs that X P 0 K ⊆ X P 1 K and Z 1 ⊆ Z 0 . The interpretation of these restrictions not only depends on the employment rates between year 0 and 1 but also on whether the propensity scores in year 1 overlap those of year 0. Despite these requirements, there is a benefit of using an ordered rather than dichotomous selection rule. In the latter case, the restriction for G s Y 1,1,0 would have been that the support of the propensity scores of employment for year 1, µ 1 0 (Z), should overlap with those of year 0, µ 0 0 (Z). For ordered selection it is only necessary that one of the propensity scores in year 1, i.e. µ 1 h (Z), h = 1, . . . , K, overlaps with the propensity score of employment for year 0, i.e. µ 0 0 (Z). The decomposition (8) To illustrate the difference with MR, suppose that the population model in period t is the following parametric version of the HSM: where the first element of X t is the constant term, and E t and V t are distributed independently of (X t , Z t ) as bivariate normal with zero means, variances σ 2 Et and σ 2 Vt , and correlation coefficient ρ t . 8 The counterfactual mean of Y for the selected population when the LASF is as in group t, F Z,V is as in group k, and the selection rule is as in group r, is: where: denotes the LASF in group t. The observed mean of Y t in the selected population, integrating over Z t , is: where λ denotes the inverse Mills ratio. We decompose the difference µ s Y 1,1,1 −µ s Y 0,0,0 between two time periods, t = 0 and t = 1, into selection, composition and structural effects. MR define the selection effect as: This comprises the following four elements: is the counterfactual probability of selection in group k when the selection rule is as in group r and Φ denotes the standard normal CDF. The first two elements in (18) We now present the selection, composition and structural effects for our decomposition. Plugging the expression for µ t (x, v) into (16) gives, after some straightforward calculations: Our selection effect is: The first element on the right-hand-side of (19) is the effect on the average wage from changes in the distribution of observable characteristics of the selected population, holding the population distribution constant, resulting from applying the selection equation from period 0 to period 1. It is positive if those entering the selected population have characteristics associated with higher wages. This element is missing in the selection effect in (18). The second element is the corresponding effect for the unobservable characteristics and corresponds to the first in (18). Our composition effect is: The first element on the right-hand side of (20) is the change in the average wage resulting directly from changes over time in the distribution of the observable characteristics while the second element is the same as the second term in (18). Finally, our structural effect is: The first element on the right-hand side of (21) reflects the impact of changes over time in the returns to observable characteristics while the second captures the type and degree of sorting and is the same as the sum of the third and fourth elements in (18). As the expectation involving the inverse Mills ratio is positive, its contribution is positive when ρ 1 σ E 1 > ρ 0 σ E 0 . Finally, consider a simple example illustrating that the two elements of the structural effect cannot generally be identified in nonseparable models. 10 Consider a multiplicative version of the parametric HSM obtained by replacing (14) with: and weaken the parametric assumption on the joint distribution of E t and V t by only . α t and ρ t cannot be separately identified from the moment condition: 4 Empirical results We start by describing the variables included in Z using the ASEC. Following MR, we include six indicator variables for the highest educational attainment reported. Namely, (i) 0-8 years of completed schooling, (ii) high school dropouts, (iii) high school graduates or 12 years of schooling, (iv) some college, (v) college, and (vi) advanced degree. We include a quartic polynomial in potential experience and interact this with the education levels. 11 We use 5 indicator variables for marital status: (1) married, (2) separated, (3) divorced, (4) widowed, and (5) never married. for regions: northeast, midwest, south and west. Finally, we use linear terms for the number of children aged less than 5 years interacted with the indicator variables for marital status. For the MORG we use 5 levels of education as the two lower categories in the ASEC are merged. The variables Black, Hispanic, experience and region are the same. Only one indicator for marital status is used (married or not) and we employ household size, and its interaction with marital status, as the only household characteristic. With the exception of the household size and composition variables, all of the conditioning variables appear as both determinants of annual hours and hourly wages. While one might argue that household size and composition may affect hourly wage rates, we regard these exclusion restrictions as reasonable and note that similar restrictions have been previously employed (see, for example, MR). However, given their potentially contentious use we explore the impact of not using them below. The assumption that annual hours of work do not affect the hourly wage rate means that the variation in hours across individuals is a source of identification. Although our primary focus is the wage decomposition, we highlight the ma- 11 We employ the methodology described in MR for education and potential experience. We only report the result of the latter here; see Figure 5 . The selected points in the hours distribution are 0, 1000 and 2000. We find that many of the individual characteristics have an impact on the level of annual hours worked. This is not particularly surprising given the large literature on labor supply documenting the roles of education and marital status on labor market participation. Perhaps what is more surprising is that the magnitude of the impact of these variables does not appear to change substantially over the sample period in either the ASEC or MORG data. The exception is with respect to the exclusion restrictions which became less important over time. This is consistent with Card and Hyslop (2021) . Note that the level of education has drastically increased over the sample period and this has had a substantial impact on the hours distribution. We also estimated models for annual weeks of work using distribution regression and ordered models for annual weeks and annual hours using the ASEC. The wage equations are estimated for each year by distribution regression over the subsample with positive hourly wages. The conditioning variables are those in the hours equation with the exception of the household size and composition variables. We also include the appropriate control function, its square, and interactions be- We start with annual hours as the censoring mechanism and Figure 6A presents the decomposition for the mean, which increases by 25% over our sample period. The total effect is driven by the composition effect although in several instances the structural effect is contributing. It is generally negative and small relative to the composition effect. The contribution of the selection component is negative and small. A negative selection component implies that females are positively selected into employment and those who entered employment between 1975 and 2020 were less productive than those already employed. Annual hours is an economically attractive censoring mechanism as it exploits the variation in annual hours induced both by hours and by weeks. However, it is possible that selection operates either through hours or weeks exclusively. We first address this issue by replacing annual hours with annual weeks as the selection mechanism. The results from these decompositions using the ASEC are in Figures 7A-7F . Their primary feature is their similarity to those for annual hours. This suggests that the control function from the annual hours censoring mechanism is highly correlated with that from annual weeks despite the differences in their respective distributions. Now consider the decompositions for the MORG recalling that wages are measured differently than in the ASEC and the hours measure is based on the survey week. We implement our censored selection estimator using hours in the survey week as the censoring mechanism noting that only a subset of the exclusion restrictions Our results from Section 4.2 seem robust to the use of either hours or weeks as the selection variable in the censored selection model. Figures 11A-11C report the decomposition for the double selection mechanism. There continues to be no evidence of selection above the median so we report the decompositions for D1, Q1 and the median. While there are some differences in these figures compared to those for selection using only annual hours or annual weeks they are relatively small. These results seem to suggest that the unobservables which increase participation on any margin, such as usual weeks, usual hours, hours in survey week, are all highly correlated. To We first explore the term ρ t σ Et in (17). We estimate the HSM using the MR sample and exclusion restrictions to obtain the results in Figure E This contrasts with MR. The three obvious causes are the use of the normality assumption in the HSM, the identifying power introduced through hours as a censoring variable in the selection equation, and the nonseparable nature of our model. To address these issues we first estimate the model using a parametric approach which relies on normality but which exploits the variation in hours for identification purposes. We employ the Vella (1993) To more closely correspond to the HSM we divide the generalized residual by the 12 Hirsch (2005) provides empirical evidence supporting this assumption. estimated standard deviation of working hours, σ Vt . The only difference with the HSM is the use of the Tobit generalized residual rather than the inverse Mills ratio. We plot the corresponding coefficient on the Tobit generalized residual in Figure 9 -B. The coefficient on the Tobit generalized residual also estimates ρ t σ Et . Two striking features are revealed in Figure 9 -B. First, under normality the estimates of ρ t σ Et and the coefficient on the generalized residual should be identical. However, the estimates are very different and most importantly the coefficient estimate is always positive. As there is no reason that departures from normality will bias the estimates of ρ t σ Et and the coefficient for the Tobit generalized residual in the same manner one could interpret the difference in the estimates as evidence of non-normality. However, recall that the Tobit generalized residual also exploits variation in the hours variable for identification purposes and this could contribute to the difference in the signs and the behavior of the two coefficients. Second, the pattern of movement in the coefficient on the generalized residual is almost identical to the average derivative of our control function despite the drastically different ways in which each is computed. The two procedures are very different but each exploits the variation in hours as a means of identification. While it appears that the use of the variation in hours as the source of identification is the cause of the differences with MR, it is possible that the departures from normality may also be responsible. The final approach we explore is the use of the propensity score as the control function noting we allow it to enter the wage equation in a nonseparable manner (see, for example, Newey, 2007) . The propensity score employs the exclusion restrictions as the sole source of identification. We estimated the model and computed the average derivative of wages with respect to the propensity score. The results are presented in Figure 9 -C. This derivative also changes sign as we move through the sample period and shows behavior similar to ρ t . We conclude that the differences in terms of the relationship between E t and V t between our results and MR are due to the use of the variation in hours which appear to identify a different pattern of sorting. This produces a relatively larger positive value of E t and that individual will have relatively higher hours and wages. In this setting the value of the inverse Mills ratio for both individuals will be the same while the Tobit generalized residual of the individual with the higher value of H will be greater than the other. This suggests that the Tobit generalized residual is capturing information regarding "sorting" into hours which is ignored by the inverse Mills ratio. Moreover, the inverse mills ratio, unlike the Tobit generalized residual, is unable to explain the variation in wages across these two individuals. It is important to explore why ρ t might change sign for the models identified solely by exclusion restrictions. A negative ρ t implies that the working individuals with the lowest probabilities of participation should have the lowest observed wages among individuals with the same observed characteristics relevant for the wages, X. The reverse is true for a positive ρ t . We explore this by estimating a wage regression identical to the second stage of MR while replacing the inverse Mills ratio by a dummy variable for a child below the age of 5 years. The impact of having a "young child" was negative until 1982 at which time it turned, and remained, positive. This corresponded to a period, also reported by Card and Hyslop (2021) , in which the magnitude of the negative impact of a "young child" on the employment decision decreased. While we acknowledge the presence of other ongoing factors this change in the impact of "young child" could generate a change in the sign of ρ t . For example, in the absence of other influences, the large positive influence of "young child" on the value of the inverse Mills ratio combined with negative correlation between "young child" and wages would produce a negative value of ρ t . In contrast a decreasing effect of "young child" on participation would produce a smaller value for the inverse Mills ratio and that, combined with the positive correlation between "young child" and wages, would produce a positive ρ t . We highlight that we consider the above discussion as suggestive rather than conclusive. Our objective is to consider the possible causes of the differences in the results from the use of the two control functions. The evidence suggests that part in the sample period. 13 We provide the decompositions of changes in inequality using the annual hours as the censored selection variable for the ASEC data, hours in the survey week for the MORG and annual weeks as the ordered selection variable for the ASEC. For each of these models and selection rules we decompose the interquartile and interdecile ratios. Those for annual hours using the ASEC are reported in Figure 13 and those for the MORG in Figure 14 . The interquartile ratio is driven by each of the components. Neither the composition or structural effect dominates throughout the sample period. The selection effect contributes throughout the period and clearly increases inequality. The interdecile ratio is driven primarily by the structural effect especially during the drastic increase at the beginning of the sample period. The selection effect is clearly important and frequently more important than the composition effect. For the MORG the conclusions regarding the structural and composition effects are similar to those for ASEC while the selection effects are slightly smaller. This reflects the smaller selection effects at lower quantiles in the wage decompositions (as presented in Figures 6 and 8) . The evidence for both data sets support that selection has a modest but important impact on wage inequality that varies in magnitude over the sample period. As the wage decompositions based on the ordered selection rule suggested selection was more important than in the censored selection models (see Figure 12 ) we examine now whether this carries over to the inequality decompositions. We do not report the result but note that the evidence is similar to that for the censored selection rule. Our results indicate that as an increasing number of females have entered the labor market they have reduced wages at the lower parts of the wage distribution while having no impact on wages above Q2. This increases measures of inequality based on ratios involving lower and upper quantiles. Potentially, there are two reasons why selection increases inequality based on whether either the observed or unobserved characteristics of those participating has changed over time. However, an examination of education levels, for example, suggests that observed characteristics have played a minor role. In particular, we find that those with education higher than high school degrees were more likely to participate over the whole sample period and that this did not change over time. This suggests that our results reflect changing unobserved characteristics. The selection effect captures the difference between the observed wage distribution and the counterfactual in which women participated as in 1975. Our decomposition method presented in (5) imposes that this difference captures the exit of females with lower levels of the control function. Figure E .5, reveals a strong and positive relationship between wages and the control function suggesting that selection effects reflect that women entering the labor market were less productive than observationally identical women participating in 1975. Our evidence of positive sample selection over the whole period implies that the decision to work is largely based on economic motivations. However, as employment rates have increased this has seen a reduction in sorting on economic grounds. This is consistent with the explanation provided by AB for the U.K. labor market. This is also consistent with the results above that the conventional household background family characteristics have become less important in explaining participation and hours worked. The reduction in positive sorting describes the changes in the hours distribution from the mid 1980s to the end of the 1990s. Blau et al. (2021) argue that the booming economy and welfare reform may have played an important role in the 1990s. Our collective evidence suggests that post 2000 there was an increase in positive selection. This supports the evidence in Blau et al. (2021) . Towards the end of the sample period it appears that the impact of selection on inequality and, more generally, wages has returned to 1975 levels. This paper documents the changes in female real wages over the period 1975 to 2020. We decompose these changes into structural, composition and selection components by estimating a nonseparable model with selection. Female wage growth at lower quantiles is modest although the median wage has grown steadily. The increases at the upper quantiles for females are substantial and reflect increasing skill premia. These changes have resulted in a substantial increase in female wage inequality. As our sample period is associated with large changes in the participation rates and the hours of work of females we explore the role of changes in "selection" in wage movements. We find that the impact of these changes is to decrease the wage growth for some h ∈ (0, ∞) such that P(T = 1) > 0. The estimator of the LASF is µ(x, v) = w(x, v) ′ β, where w(x, v) is a d w -dimensional vector of transformations of (x, v) with good approximating properties, and β is the OLS estimator: where W i = w(X i , V i ). The estimator of the LDSF is G(y, x, v) = Λ(w(x, v) ′ β(y)), where y ∈ R and β(y) is the logistic distribution regression estimator: Finally, in the third step we use (6) to estimate the counterfactual CDF (5) by: where the average is taken over the sample values of V i and Z i in group k, n s kr = n i=1 1{ V i > 1 − π r (Z i )}, β t (y) is the logistic distribution regression estimator for group t from the second step, and π r (z) is the estimator of the propensity score of selection for group r from the first step. Given G s Y t,k,r , we estimate the counterfactual QF (7) by: q s Y t,k,r (τ ) = ∞ 0 1{ G s Y t,k,r (y) ≤ τ }dy − 0 −∞ 1{ G s Y t,k,r (y) > τ }dy. Following FVV, inference is based on the weighted bootstrap (Praestgaard and Wellner 1993) . This method obtains the bootstrap version of the estimator of interest by repeating all the estimation steps including sampling weights drawn from a nonnegative distribution with mean and variance equal to one (e.g., standard exponential). C Proof of Lemma 2 The first equality is definition. The second equality uses V ∼ U (0, 1) and the third equality uses independence of (E, V ) and Z. The final equality uses the definitions of Y and H and is identified because (x, p) ∈ X P K . (12) Adapting the representation of the distribution of the observed Y in Section 3.2 to the ordered selection rules yields G s Y (y) = Z k G(y, x, v)1{v > µ 0 (z)}dvdF Z (z) where the second equality uses that the interval (µ 0 (z), 1] is the union of the disjoint intervals (µ h−1 (z), µ h (z)], h = 1, . . . , K. Skills, tasks and technology: Implications for employment and earnings Quantile regression under misspecification, with an application to the U.S. wage structure Quantile selection models with an application to understanding changes in wage inequality Trends in U.S. wage inequality: Revising the revisionists The contribution of the minimum wage to US wage inequality over three decades: A reassessment The impact of selection into the labor force on the gender wage gap The great reversal in the demand for skill and cognitive tasks Trouble in the tails? What we know about earnings nonresponse thirty years after Lillard, Smith, and Welch The relationship between wages and weekly hours of work: The role of division bias Quantile regression model with unknown censoring point An alternative estimator for the censored quantile regression model Wage differentials, employer size and unemployment Female earnings inequality: the changing role of family characteristics on the extensive and intensive margins Distribution regression with sample selection, with an application to wage decompositions in the UK Inference on counterfactual distributions Semiparametric estimation of structural functions in nonseparable triangular models Three-step censored quantile regression and extramarital affairs Fixed costs and labor supply Labor market institutions and the distribution of wages, 1973-1992: A semiparametric approach Nonseparable sample selection models with censored selection rules Hours worked and the U.S distribution of real annual earnings 1976-2019 Integrated public use microdata series Decomposition methods in economics Shadow prices, market wages and labor supply Sample selection bias as a specification error The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior Econometric evaluation of social programs, part II Why do part-time workers earn less? The role of worker and job skills Identification and estimation of triangular simultaneous equations models without additivity Changes in the wage structure and earnings inequality Changes in relative wages, 1963-1987: Supply and demand factors Labor Supply A proposal for protecting low-income workers from monopsony and corruption Wage inequality in the United States during the 1980s: Rising dispersion or falling minimum wage? Increasing residual wage inequality: Composition effects, noisy data, or rising demand for skill? The gender gap between earnings distributions Household surveys in crisis Selection, investment, and women's relative wages over time Human capital investment, inequality, and economic growth Nonparametric continuous/discrete choice models Matching as a tool to decompose wage gaps Unequal pay or unequal employment? A cross country analysis of gender gaps Exchangeably weighted bootstraps of the general empirical process A simple estimator for models with censored endogenous regressors Growth in women's relative wages and inequality among men: One phenomenon or two? We provide an estimator of F H * | X,Z based on distribution regression. This is an alternative to Buchinsky and Hahn (1998) and Chernozhukov and Hong (2002) , which developed estimators based on quantile regression. Start with a distribution regression model for F H * | Z . That is:where Λ is a known link function and r(z) is a d r -dimensional vector of transformations of z with good approximating properties. We also assume a binary response model for the propensity score of selection:be a random sample of (HY, H, D, Z), where HY denotes that Y is only observed when D = 1. The proposed estimator consists of 2 steps:1. Estimation of π(z) using binary regression in the entire sample: π(z) = Λ(p(z) ′ δ)where:2. Estimation of F H * | Z by distribution regression with sample selection correction in the selected sample: We estimate the LASF µ(x, v) and the LDSF G(y, x, v) for each group using flexibly parametrized ordinary least squares (OLS) and distribution regressions, where the unknown control function is replaced by its estimator V i = F H * | Z (H i | Z i ) from the previous step. For reasons explained in FVV, we estimate over a sample trimmed with respect to the censoring variable H. We employ the following trimming indicator among the selected sample: T = 1{0 < H ≤ h}