key: cord-0445834-7fkfs3f0 authors: Duchini, Emma; Simion, Stefania; Turrell, Arthur title: Pay Transparency and Cracks in the Glass Ceiling date: 2020-06-25 journal: nan DOI: nan sha: 9c49917a4ca646a33ad0a470220e5e5c0247937d doc_id: 445834 cord_uid: 7fkfs3f0 Each year since 2018, more than 10,000 UK firms have been required to publicly disclose their gender pay gap and gender composition. This paper studies how this transparency policy affects the occupational outcomes and wages of male and female workers. Theoretically, pay transparency represents an information shock that alters the bargaining power of male and female employees vis-`a-vis the firm in an asymmetric way. As women are currently underpaid, this shock may improve women's relative outcomes. We test these theoretical predictions using a difference-in-difference strategy that exploits the variation in the UK mandate across firm size and time. Our results show that pay transparency increases the probability that women are hired in above-median-wage occupations by 5 percent compared to the pre-policy mean. Additionally, it leads to a 2.8 percent decrease in male real hourly pay in treated firms compared to control ones. Combining the difference-in-difference strategy with a text analysis of job listings, we also find suggestive evidence that treated firms in industries with a high gender pay gap become more likely to post wage information than firms in the control group. The 4th of April 2018 was the first deadline for more than 10,000 UK firms to publish statistics on their gender pay gaps. Up until that time, less than 3 percent of UK firms had ever publicly disclosed this information (Downing et al. 2015) . The following day, all national British newspapers commented on the figures. The second deadline fell in April 2019, and again drew significant media attention (Financial Times 2019 , BBC 2018 , The Guardian 2018 , Financial Times 2018 . 1 While the UK is the only country in which some companies are required to publish their gender pay gaps publicly, many countries are adopting pay transparency policies. All have the declared objective of reducing the gender pay gap. 2 The argument for these initiatives goes as follows: pay transparency is an information shock that asymmetrically alters the bargaining power of male and female employees vis--vis the firm because women are paid less than men on average (Cullen and Perez-Truglia 2018b). Due to the potential negative effects of unequal pay on firm reputation, the shock also incentivizes targeted firms to hire more women in better paid positions, and discourages the promotion of male employees. In turn, this could translate into improved pay and occupational outcomes for women relative to men. This paper tests these theoretical predictions in the UK setting. The British government passed the Equality Act 2010 (Gender Pay Gap Information) Regulations 2017 in February 2017. The act mandates that all firms registered in Great Britain with at least 250 employees have to publish, on a dedicated government website, a series of indicators that include percentage mean and median gender hourly pay differentials, and the share of women in each quartile of the wage distribution. If a firm has at least 250 employees by the end of each financial year (April), it has to provide these figures by the end of the following financial year. According to the government, all firms that were required to comply with the law did so during its first two years of operation. To identify the impact of this policy on wages and occupational outcomes of male and female workers, we use a difference-in-difference strategy that exploits the variation across firm size and over time in the application of the government mandate. To avoid capturing any potential impact of this policy on firm size, we define the treatment status based on firms' number of employees prior to the introduction of the mandate. To enhance comparability, we restrict the sample to firms with +/-50 employees from the 250 threshold. To conduct this analysis, we use the Annual Survey of Hours and Earnings (ASHE) from 2012 to 2019. This is an annual employer survey covering 1 percent of the UK workforce, and is designed to be representative of the employee population. Crucially for us, it provides information on gender, number of employees, firm and individual identifiers, wage components, hours worked, tenure, occupation, and industry. Our analysis delivers two key findings. First, the mandate changes the occupational composition of the pool of female employees in treated firms compared to control ones, by increasing the probability that women work in above-median-wage occupations by 5 percent relative to the pre-policy mean. As this effect comes from newly hired women, so far it has failed to translate into a visible increase in women's salaries. However, our second finding is that pay transparency leads to pay compression from above: the mandate leads to a 2.8 percent decrease in male real hourly wages in treated firms relative to control ones following its introduction. In our preferred specification, this effect is statistically different from the impact on women's real hourly pay. 3 A series of event study exercises show that these results do not capture pre-policy differential trends in the outcomes of interest between treated and control groups. With additional robustness checks, we exclude that our estimates capture the impact of time shocks affecting firms above and below the 250 threshold differently. First, our estimates are unchanged in triple-difference regressions that account for within-group time shocks common to male and female employees. Second, difference-in-discontinuity specifications that control for firm-size specific time shocks deliver the same results as our difference-in-difference model. Third, we estimate placebo regressions, pretending that the policy binds at different firm size thresholds, and find no significant effect of placebo policies. Finally, we check that our estimates are not sensitive to the choice of the estimation sample around the 250 cutoff, and that they are robust to the year used to define treatment status. To delve into the mechanisms driving the estimated effects, and, in particular, in order to understand how treated firms may have been able to attract more women, we analyze their hiring strategies. For this, we exploit a unique data set compiled by Burning Glass Technologies (BGT hereafter) which collates online job listings from 2012 onward for the UK. We focus specifically on wage posting decisions and the wording of job listings. Many studies document a gender gap in bargaining skills in favor of men (Leibbrandt and List 2015 , Bowles et al. 2007 , Babcock et al. 2003 . Despite this, in the BGT data less than 40 percent of vacancies contain automatically identifiable information on wages. A recent strand of papers from the psychological and management literature also suggests that women are less likely to apply for jobs with male-oriented worded postings -that is a vocabulary that is usually associated to men in implicit association tests (Tang et al. 2017 , Gaucher et al. 2011 . In light of these stylized facts, we combine the difference-indifference strategy with a text analysis of job listings to test whether treated firms change wage posting decisions and the wording of their job listings following the introduction of the pay transparency policy. Our results suggest that firms belonging to industries with a high gender pay gap are a third more likely to post wages relative to the pre-treatment period. As for the wording of job listings', while we find that firms with a male-oriented wording tend to publish a larger gender pay gap, we fail to detect any significant impact of the policy on this margin of decision. Finally, to provide a comprehensive picture on the impact of this policy on targeted firms, we study the reaction of the stock market following the publication of gender pay gap indicators by firms listed on the London Stock Exchange -representing around 10 percent of firms targeted by the mandate and one third of the listed firms. This analysis indicates that, in the first year of the mandate, firms' cumulative abnormal returns decrease by around 35 basis points in the aftermath of the publication. While this effect fades away after four days, it shows that firms publishing gender pay gap indicators are subject to the scrutiny of investors. In turn, this suggests that a reputation motive could explain why firms reacted to the policy. Overall, this paper provides several contributions to different strands of literature. First, it adds to the growing number of studies from the economic and management literature analyzing the impact of pay transparency policies on personnel management decisions and the gender pay gap (Gulyas et al. 2020 , Baker et al. 2019 , Bennedsen et al. 2019 , Burn and Kettler 2019 , Mas 2017 . The closest studies to ours are Gulyas et al. (2020) , Baker et al. (2019), and Bennedsen et al. (2019) . Baker et al. (2019) studies the effect on the gender pay gap of a Canadian law imposing that public sector organizations publish employees' salaries above a certain pay threshold, while Gulyas et al. (2020) and Bennedsen et al. (2019) analyze the impact on the gender pay gap of a 2011 Austrian law and a 2006 Danish law, respectively, mandating private firms to provide employees' representative pay measures by gender and occupation. Both Baker et al. (2019) and Bennedsen et al. (2019) find that transparency leads to pay compression from above, while Gulyas et al. (2020) find no impact on individual wages and the gender pay gap. Relative to these studies, the UK mandate has two unique features that could help us improve our understanding of the effects of pay transparency. First, it requires the publication of the percentage gender pay gap, rather than pay levels by gender. In the latter case, both male and female workers' bargaining power may increase as all employees acquire information on gender differentials, but also on one's own gender pay. In contrast, in the UK, this second channel is shut down. Second, the public disclosure of the information, coupled with extensive media attention, magnifies the information shock and stimulates behavioral responses. In addition to studying a unique setting, our paper also offers a comprehensive analysis of the impact of the pay transparency policy, that includes its effects on firms' hiring strategies and the stock market. Next, our text analysis of job listing data specifically contributes to the growing number of papers from different fields studying the importance of implicit biases in job postings (Burn et al. 2019 , Tang et al. 2017 , Mikolov et al. 2013 , Gaucher et al. 2011 . In particular, to the best of our knowledge, this is the first paper that documents a correlation between firms' hiring strategies and the magnitude of firms' gender pay gap. More broadly, this analysis will add to the increasing strand of economics papers that use job advertisement data to study the dynamics of the labor market, from the evolution of skill requirements to labor market concentration (Azar et al. 2019b , Azar et al. 2018 , Deming and Kahn 2018 , Azar et al. 2019a ). Finally, our study contributes to the analysis of policies aimed at tackling the gender pay gap. As policies such as gender quotas and paternity leave have been proven to have a negligible impact so far, it seems especially important to assess the role of other interventions, including pay transparency (Antecol et al. 2018 , Wasserman 2019 , Bertrand et al. 2019 , Ekberg et al. 2013 . The paper proceeds as follows. Section 2 describes the institutional setting and the UK transparency policy. Section 3 discusses the identification strategy. Section 4 describes the data used in the empirical analysis. Section 5 illustrates the main results. Section 6 reports the results of a battery of robustness checks. Section 7 discusses the potential mechanisms behind the main results, focusing in particular on firms' hiring strategies. Section 8 presents the stock market reaction. Section 9 concludes. In 2015, the UK government launched a process of consultations with employers to enhance pay transparency. At that time, the average gender pay gap for all employees in the UK stood at 19.1 percent. Moreover, women made up only 34 percent of managers, directors, and senior officials (Government Equalities Office 2015) . According to the government's view, "greater transparency will encourage employers and employees to consider what more can be done to close any pay gaps. Moreover, employers with a positive story to tell will attract the best talent" (Government Equalities Office 2015). In February 2017, this process resulted in the passing of the Equality Act 2010 (Gender Pay Gap Information) Regulations 2017. This mandate imposes that all firms registered in Great Britain that have at least 250 employees should publish gender pay gap indicators both on their own website and on a dedicated website managed by the Government Equalities Office (GEO hereafter). 4 , 5 The timing of publication works as follows: if a firm has at least 250 employees by the end of a financial year (April), it has to provide gender pay indicators by the end of the following financial year. Firms themselves must calculate their number of employees, using guidelines provided by the government. Importantly, they have to adopt an extended definition of employee that includes agency workers. Partners of firms are also included in the definition of employees, but should not enter in the calculation of the indicators. Finally, part-time workers have the same weight as full-time ones in the calculations. The indicators that firms have to report include: the overall mean and median gender hourly pay gap, expressed in percentage terms; the overall mean and median gender bonus gap; the proportion of male and female employees who receive any bonus pay; and the proportion of male and female employees in each quartile of the company wage distribution. Table 1 provides sample means of these indicators for the two years prior to 2020 that firms have had to publish them. The mean gender pay gap is just below 15 percent and decreases by 1 percent between 2017/2018 and 2018/2019. The median gender gap is smaller in both years and slightly increases over time, suggesting that the decrease in the mean gap is driven by a drop in extreme values. Both the mean and the bonus gap are smaller but it is worth noting from the standard deviation that some firms mistakenly reported the level gap rather than a percentage, making it difficult to interpret these mean values. 6 The share of women receiving bonus pay is smaller than that of men in both years, and the ratio remains stable over time. The gender ratio along the wage distribution is balanced at the bottom, but the share of women is smaller in the upper part of the distribution. Yet, this 4 This legislation does not apply to Northern Ireland. 5 The mandate applies to both private and public sector; however, the public sector was already subject to some transparency measures. According to regulations introduced in 2011, public bodies in England with over 150 employees were required to publish information annually on the diversity of their workforce, though no gender pay gap information. The Welsh regulations, also introduced in 2011, require public bodies to publish the number of men and women employees broken down by pay level. Public authorities are also required to make arrangements for identifying and collecting (but not necessarily publishing) information about differences between the pay of people with protected characteristics such as gender or ethnicity. Where a difference can be linked to a protected characteristic, public authorities are required to set equality objectives to address the causes of such differences. Finally, Scottish public organizations with 20 or more employees were required to publish information on the gender pay gap since 2012. 6 When excluding the bottom and top 1 percent, the mean bonus gap stands at 23.22 in 2017/18 and 23.76 in the second year. proportion increases by around 1 percent over the two years. Finally, figure 1 also shows that the mean gender hourly pay gap is larger in firms that have a lower share of women at the top of the wage distribution. From now on, we will refer to these data as the GEO data. Three other features of this policy are important to understand in the UK context. First, the policy does not impose sanctions on firms that do not improve their gender pay gap over time. However, the Equality and Human Rights Commission, the enforcement body responsible for this regulation, can issue court orders and unlimited fines for firms that do not comply with the regulations that mandate the disclosure of pay gaps. As of 2020, all firms were deemed to have complied. Figure 2 reports the distribution of submission dates for the two years the mandate has been in place. While some firms do not meet the deadline, the majority publish their data in the last month before deadline. Second, this policy is likely to represent an information shock both inside and outside the firm. According to a survey addressed on behalf of GEO, out of 855 private and non-profit firms with at least 150 employees, only one third of firms have ever computed their gender pay gap, and just 3 percent have made these figures publicly available. Moreover, up to 13 percent declared that staff are discouraged from talking about it and 3 percent reported that their contracts include a clause on pay secrecy (Downing et al. 2015) . Finally, this policy is salient. Not only are the figures publicly available on a government website, but, as noted in the introduction, they also receive extensive media attention each year that they are published. Importantly, figure 3 shows that google searches for the term gender pay gap also spike around each year's deadline, indicating that this policy has attracted significant public interest. To identify the impact of the 2017 transparency policy on wages, occupational outcomes and firmlevel outcomes, we exploit the variation across firm size and over time in its implementation. Specifically, we estimate a difference-in-difference model that compares the evolution of the outcomes of interest in firms whose size is slightly larger (treated group) or smaller (control) than the 250-employee cutoff. As firm size can be endogenously determined, we define treatment status based on firm size in 2015, prior to the start of the consultation process to implement the mandate. To enhance comparability between treatment and control group, in the main specification we consider firms with +/ − 50 employees from the 250 threshold. 7 When studying employees' outcomes, our baseline regression model is as follows: where i is an employee working in firm j, having 200-300 employees, in year t, with t running between 2012 and 2019. 8 The outcome Y ijt is either a measure of occupation held, job mobility, pay (hourly or weekly wages, bonuses or allowances), or hours worked. As for the regressors, α j are firm fixed effects that capture the impact of firm-specific time-invariant characteristics such as industry, or firm culture. 9 θ t are year fixed effects that control for time shocks common to all firms such as electoral cycles. T reatedF irm j is a dummy equal to one if a firm has at least 250 employees in 2015, and P ost t is a dummy equal to one from 2017 onward. The vector X it includes individual controls. In regressions analyzing how the policy affects the composition of firms' workforces, individual controls are limited to age and age squared. When considering wages, we control for individual fixed effects to take into account compositional effects. In what follows, we also compare the results of specifications where the vector Z jt contains different time-varying firm-level controls, such as region-specific time shocks, industry linear trends, or measures of product-market concentration, such as interaction terms between the 2011 industry-level Herfind-ahlHirschman index and year fixed effects. Our main coefficient of interest is β 0 which, conditional on the validity of this identification strategy, should capture any deviation from a parallel evolution in the outcome of interest between the treatment and the control group due to the introduction of the mandate. All regressions are weighted with UK Labor Force Survey weights, though in the appendix we show that our results do not depend on this choice. Standard errors are clustered at the firm level, though in the appendix we also present specifications with other clustering groups such as firm size, or firm size times industry. As our hypothesis is that this policy will affect men and women differently, we will estimate this regression separately by gender. All regression tables will also report the p-value of the t-test on the equality of coefficients for men and women. To study the overall effect of this government mandate on the outcomes of interest, we make use of several sources of data, including individual-level data on pay and occupational outcomes, firmlevel data on job vacancies and stock prices. Here we first introduce the data used to measure employees' outcomes. This is the Annual Survey of Hours and Earnings (ASHE), an employer survey covering 1 percent of the UK workforce, conducted every year, and designed to be representative of the employee population. 10 The ASHE sample is drawn from National Insurance records for working individuals, and their respective employers are required by law to complete the survey. Specifically, ASHE asks employers to report data on wages, paid hours of work, tenure in the firm, and pensions arrangements for the selected employees, all of which are measured in April. Other variables relating to age, occupation and industrial classification, and firm size are also available. Once workers enter the survey, they are followed even when changing employer, though individuals are not observed when unemployed or out of the labor force. In practice, ASHE is an unbalanced panel data set at the employee level. From ASHE, we create the following variables. First, to measure occupational outcomes and worker flows, we proceed as follows. We construct a dummy equal to one if a worker is employed in an occupation whose median wage is in the top two quartiles of the pre-policy wage distribution (2012) (2013) (2014) (2015) (2016) . This includes skilled-trades, administrative, technical, and professional and manage-rial occupations. For brevity, we refer to this as "working in above-median-wage occupations". We create a dummy variable that is equal to one if the worker has changed job in the last year (ASHE provides a categorical variable to measure this). We also use months of tenure in the firm, though this is missing for around 3 percent of the estimation sample. And, finally, we construct a dummy variable that is equal to one if the employees leaves the firm in t + 1. By construction, this variable is missing in the last year of data. As for pay measures, the main variable of interest is log real hourly pay, including bonuses and allowances, but excluding overtime pay; however, we also consider log basic real hourly wage, bonuses and allowances separately. To study the impact of the policy on bonuses and allowances, we use the inverse hyperbolic sine transformation to account for the fact that many workers do not receive any bonus or allowance. Finally, we consider log real weekly pay, and weekly hours worked, distinguishing between contractual hours and overtime. In the empirical analysis, we use data over the period 2012-2019. We chose this time window for two reasons. First, data on firm job advertisements that we use in the analysis of mechanisms are only available from 2012 onward. Second, the ONS' occupational classification changes in 2010, and the variables that follow the new classification are only available from 2012 onward in the employee data set. However, as soon as new waves of ASHE become available, we will add them to the estimation period. Table 2 provides summary statistics for the main outcomes, measured in the pre-treatment period. Several things are worth noting. First, the profile of workers in treated and control firms is remarkably similar. Second, focusing on the treatment group (columns 1 and 3), there is a six percent gender gap in the probability of working in above-median-wage occupations. Next, the unconditional percentage hourly pay gap amounts to 18 percent. There is also a large gender gap both in the probability of receiving allowances or bonuses (35 and 33 percent respectively), and a huge one in the amount received (around 60 and 75 percent). Men are also more likely to work in the private sector than women -though this share is already 80 percent. Finally, it is worth noticing that among both men and women, only one third of workers is covered by a collective agreement. This figure is important to consider when thinking about the mechanisms through which the policy may affect wage and occupational outcomes. In principle, pay transparency may induce women, especially those covered by collective agreements, to put pressure on employers to obtain promotions or wage increases. Yet, with such a low share of women covered, it is unlikely that this channel will be important in triggering firms' responses. This section illustrates our key findings. First, we present the results on occupational outcomes and job mobility, then we move to the analysis of wages, considering both different pay measures and various components of wages. Figure 4 introduces the analysis on occupational outcomes. In particular, it shows the trends in the variable "above-median-wage occupation" over the period 2012-2019 for employees working in treated firms (250 to 300 employees) and in control firms (200 to 249 employees). The top graph reports the trends for men, while the bottom one refers to women. We can observe two things from these figures. First, the evolution of this variable in the pre-policy period seems to be comparable across treatment and control groups, both for male and female employees. Second, while the top graph suggests that the male occupational distribution has not been affected by the policy, the bottom graph suggests that treated firms may have changed the composition of their female workforce after the introduction of the policy, namely by increasing the share of women in above-median-wage occupations. Table 3 turns to the regression analysis. Panel A reports the estimates of β 0 for men, while Panel B focuses on women, and each column refers to a different specification. Column 1 reports the estimates of the baseline specification, which controls for firm and year fixed effects. According to these results, the mandate increases the probability that women work in above-median-wage occupations by 3 percentage points -or 5 percent relative to the pre-policy mean reported at the bottom of the table. In contrast, the policy does not seem to affect the occupational distribution of men. Column 2 adds individual controls for age and age squared, but the results change little. Column 3 further includes year times region fixed effects to control for local labor market time shocks, and once again the results are little affected. 11 Columns 4 to 6 add different industry/firmlevel controls. Specifically, column 4 includes industry linear time trends, column 5 includes interaction terms between the 2011 industry-level Herfindahl-Hirschman index for product market concentration interacted with year fixed effects, and column 6 includes interaction terms between firm 2011 output level and year fixed effects. None of these controls affect the estimates of β 0 for either men or women. Thus, as the results are very similar across specifications, in what follows we take the specification of column 3 as our benchmark specification. 12 Table 4 complements these results by analyzing the impact on job mobility. Specifically, the first column reports the impact on the probability of working in above-median-wage occupations, column 2 displays the impact on the probability of having joined the firm in the last year, column 3 focuses on months of tenure in the firm, and column 4 reports the effects on the probability of leaving the firm in t + 1. The results in columns 2 and 3 suggest that the positive impact on women's occupational outcomes comes from the newly hired women. Column 4 shows instead that the policy has no effect on the probability of leaving the firm for either men or women. As the policy does not affect men's occupational outcomes or job mobility, the first implication of this table is that the overall gender composition should have changed in treated firms following this policy. While we cannot test this implication with the current available data, we will be able to do so upon gaining access to the Workplace Employment Relationship Survey for the years 2011 and 2018. This will allow us to measure the share of women in treated and control firms both before and after the introduction of pay transparency legislation. The second implication of these results is that pay transparency does not seem to affect retention rates in this context. Yet, as suggested by the "fair wage-effort hypothesis" (Akerlof and Yellen 1990), it will be important to continue monitoring this outcome as the publication of the gender pay gap indicators, coupled with firms' responses, may affect effort levels and retention rates of those workers who perceive that they are being treated unfairly by their employer. Moreover, upon getting access to the Annual Business Survey, we will also study the impact of this policy on labor productivity. Figure 5 shows the raw trends in the variable "log real hourly pay" over the period 2012-2019 for employees working in treated firms (250 to 300 employees) and in control ones (200 to 249 employees). As above, the top graph reports the trends for men, while the bottom one refers to women. Two things may be observed from these figures. First, the evolution of real hourly pay in the pre-policy period seems to be comparable across treatment and control groups, both for male and female employees. Second, the top graph suggests that male real hourly pay of employees working in treated firms may have dropped after the introduction of the mandate. As for women, it does not appear that the policy has visibly affected their real wages. Table 5 reports the estimates of the difference-in-difference model for this outcome. Panel A reports the estimates of β 0 for men, while Panel B focuses on women. Each column refers to a different specification. Column 1 presents the estimates from the baseline specification, with firm, year and individual fixed effects. According to these results, the transparency policy decreases men's real hourly pay by around 2 percent in treated firms relative to control ones after the introduction of the mandate, with this effect being significant at 5 percent. In contrast, the policy does not seem to have an effect on female real wages. Column 2 adds firm times individual fixed effects. As results are practically unchanged, this indicates that the drop in men's real wages is actually a within-firm-within-individual effect, meaning that it is experienced by individuals who were already employed at the firm before the introduction of the mandate. Column 3 adds year times region fixed effects to the baseline specification. Point estimates slightly increase but the significance level does not change. Next, as above, columns 4 to 6 add different industry/firm-level controls to the specification of column 3, but the main conclusions of the analysis are unchanged: pay transparency leads to pay compression from above. Importantly, as indicated by the p-value of the t-test on the equality of coefficients for men and women, the effects by gender are statistically different. In other words, this policy leads to a significant reduction of the gender pay gap, amounting to around 15 percent of the pre-policy mean. 13 Tables 6 and 7 further unpack the effects on male hourly wages. First, table 6 shows that weekly wages, rather than hours worked, are the margin of adjustment. Second, table 7 shows that the changes brought by the policy are mainly due to contractual wages rather than allowances and bonuses. Taken together, these results suggest that the slowdown of male real hourly wages may be explained by a decrease in the probability of being promoted, though the data we have do not allow us to measure this precisely as it does not include job level information (only occupational information). The last point that we will make in this section concerns the effect on women's pay. In light of the results on occupational outcomes, we might have expected to see an increase in women's wages. Two factors may explain why this effect has not materialized. First, both treated and control firms might have decided to raise women's wages if they are competing for the same workers. Yet, in figure 5, we do not see any sharp increase in women's wages after the introduction of pay transparency in either the treatment or the control group. An alternative explanation may have to do with compositional effects. As treated firms are hiring more women in above-medianwage occupations relative to the control group, the average woman in a treated firm becomes less experienced than in the control group. We believe that this is a very likely explanation for why we fail to see an increase in women's wages. The validity of our identification strategy depends on three assumptions. First, it has to satisfy the parallel-trend assumption, that is the evolution of the outcomes of interest is comparable in treated and control firms prior to the introduction of the policy. Second, our estimates do not capture the effect of other time shocks coinciding with the introduction of pay transparency and affecting firms on the two sides of the 250-employees cutoff differently. Third, the results do not depend on the size of the bandwidth considered around the policy cutoff, nor do they depend on the year chosen to define the treatment status. Parallel-trend assumption. To support the validity of the parallel-trend assumption, we perform event-study exercises. Specifically we augment regression 1 with the leads and lags of the mandate, as follows: (2) Figures 6 and 7 report the estimates of the β k on the probability of working in above-medianwage occupations, and log real hourly pay. In each figure, the top graph refers to men, while the bottom one refers to women. Note that 2017 is taken as the reference year. The leads of the mandate are insignificant for both variables, and genders, which strongly supports the hypothesis that the evolution of the outcomes of interest is comparable across treated and control firms before the introduction of the mandate. On the other hand, the effect on women's probability of working in above-median-wage occupations is visible already in the first post-mandate year and increases in the second year. As for the negative effect on male hourly pay, this becomes clearly visible and significant in the second year of the treatment period. In the appendix, tables A2 and A3 also show that our estimates change little when progressively restricting the pre-treatment period, which further supports the hypothesis that we are not capturing the impact of differential pre-trends between treated and control group. Contemporaneous shocks. To make sure that our estimates do not capture the effect of other phenomena occurring at the same time as the introduction of pay transparency requirements and affecting treated and control firms differently, we perform three robustness checks. First, table 8 compares the estimates from the difference-in-difference model to those of the following triple-difference model with the gender dimension as the third difference: where F em i is a dummy variable that is equal to one if i is a woman, and all other variables are defined as in regression 1. As such, this alternative specification controls for within-group time shocks that are common to male and female employees. Table 8 reads as follows. The first three columns refer to the outcome "working in above-median-wage occupations", while columns 4-6 focus on log real hourly pay. For each outcome, the first column reports the estimates of the difference-in-difference model for men, the second columns the effects on women, while the third one reports the estimates from the triple-difference model. At the bottom of columns 3 and 6, we also report the p-value on the t-test for the overall effect on women, i.e. the sum of the male coefficients plus the differential effect on women. The estimates from the triple difference model are practically indistinguishable from those of the difference-in-difference model, both in the case of the occupational outcomes and wages. The only difference is that in column 6, the coefficient on the differential effect of the policy on men and women's wages is marginally insignificant. Yet, the overall effect on women is null and insignificant. We next perform a second robustness check to support the hypothesis that our estimates do not capture the effect of other time shocks coinciding with the introduction of pay transparency and affecting differently firms on the two sides of the 250-employees cutoff. Table 9 compares the results of the difference-in-difference model with that of the following difference-in-discontinuity model: where δ reg are regional fixed effects and F irmSize j2015 is a continuous variable measuring the number of employees in firm j in 2015. The main difference between our main specification and this one is that the difference-in-discontinuity model takes into account the possibility that firms with a different number of employees are on different trends (Grembi et al. 2016) . Though our event studies seem to exclude that this is the case, this exercise should further support this assumption. Table 9 reads as follows. Panel A compares the estimates of the different models for men, while Panel B focuses on women. In each panel, the first three columns refer to the occupational outcome, while the last three refer to log real hourly wages. For each outcome and gender, the first column reports the estimates of the impact of the transparency policy from the double-difference model, while the second column presents those of the difference-in-discontinuity. While coefficients are significant only at 10 percent in this specification, the point estimates for both the occupational outcome and wages are very little affected. Finally, we run a series of placebo tests pretending that the mandate binds at different firm size thresholds. Figures 8 and 9 present the estimates of these placebo policies, together with 95 percent confidence intervals. The placebo cutoff is indicated in correspondence of the estimates. The highlighted estimate represents the one estimated at the actual policy cutoff. In each regression, the estimation sample includes firms with +/ − 50 employees from the threshold considered. Reassuringly, the "150" placebo mandate does not appear to have an impact on either male or female outcomes. This should further exclude the possibility that we are capturing the impact of time shocks happening at the same time as the mandate and that affect larger firms differently to smaller firms. As for larger placebo cutoff values, it should be noted that these regressions include all treated firms. The fact that the magnitude of the effects are non-zero may simply point to heterogeneous effects of the policy across firm size, consistent with the idea that larger firms are more exposed to public scrutiny. Specification. Our third and final set of robustness checks aims to verify that our results are robust to the choice of the bandwidth around the 250 cutoff, and do not depend on the fact that we defined the treatment status based on firms' number of employees in 2015. Figures 10 and 11 show how the estimates of β 0 from equation 1 change when restricting or enlarging the bandwidth around the 250 cutoff. As above, the top graph in each figure refers to men, while the bottom one refers to women. The x-axis reports the estimated coefficients with 95 percent confidence intervals, while the y-axis reports the bandwidth considered, from +/ − 30 to +/ − 80 employees around the policy cutoff. The estimates on the bandwidth of 50 correspond to the main specification. Figure 10 shows that the effects on women's probability of working in above-median-wage occupations is especially stable for bandwidths comprised between 30 and 60, while it vanishes for larger samples, possibly due to decreased comparability across treatment and control groups. Figure 11 shows instead that the estimated coefficients on men's real hourly pay are very similar across specifications, and only become marginally insignificant when estimating the model using the smallest sample. Conversely, estimates of the coefficient of interest on women's hourly pay are always close to zero and insignificant. Finally, table 10 compares the results when we change the year used to define the treatment status. The table reads as follows. Panel A refers to men, and panel B to women. In each panel, columns 1-4 refers to the outcome "working on above-median-wage occupations", while columns 5-8 concern the outcome log real hourly pay. For each outcome, the first column reports the results from the main specification. The following columns present the estimates obtained when defining the treatment status based on firms' number of employees in the year indicated on top of the column, 2012, 2013, or 2014. While the estimates that are significant in the main specification become marginally insignificant for one year, they are significant and similar in magnitude for all the other years. 14 To sum up, our estimates are remarkably stable across different specifications and sample sizes, which should strongly support the validity of our identification strategy. 15 7 Mechanisms: firms' hiring strategies To delve into the mechanisms driving the estimated effects, and, in particular, in order to understand how treated firms may have been able to attract more women, we analyze their hiring strategies in three dimensions: the effect of the policy on wage posting decisions, the use of gendered wording, and the offer of flexible working arrangements. Here, we present preliminary results on the first two dimensions. To conduct this analysis, we use Burning Glass Technologies (BGT) job-advertisement data for the period 2012-2019. The data are around 40 million (de-duplicated) individual job vacancies. They are collected from a wide range of online job listing sites and include a rich set of information. First, each observation includes the text of the job advertisement. Second, more than 95 percent of vacancies have an occupational SOC identifier and a county identifier. Around one third of vacancies, or 13 million observations, include the name of the employer. One concern with the dataset is the potential for selection issues related to the presence of the firm name. To guard against this, we compared the occupational distribution of the stock of vacancies in BGT to employment in the Labor Force Survey (LFS hereafter) for the same period. Figure C1 in the appendix shows that the two match well, mitigating the greatest potential concern regarding the representativity of BGT. We merge the BGT data with the GEO data, using a cosine similarity name-matching algorithm for the company names, and retain only firms that have an exact match, representing around 50 percent of the entire sample -section C.1 of the appendix provides a detailed description of the matching algorithm. 16 In what follows, we present the key dimensions we explore in this matched data set. Wage posting. Many studies document that there is a gender gap in bargaining skills. In particular, women are less likely to ask for wage increases (Bowles et al. 2007 , Babcock et al. 2003 , and tend to avoid bargaining in jobs that leave wage negotiation ambiguous (Leibbrandt and List 2015). In BGT job vacancies, only around 30 percent of job listings contain information on wages that can be automatically identified, with large heterogeneity across industries, as shown in figure 12 . Moreover, as shown in table 11, consistent with the studies cited above, on average GEO firms that are less likely to post wages also tend to have a higher gender pay gap and a lower share of women at the top of the wage distribution. This descriptive evidence suggests that wage posting may be an important dimension of adjustment for firms willing to attract more women. Gendered wording. A recent strand of psychology and management lab experiments study the importance of implicit biases in job postings (Tang et al. 2017 , Gaucher et al. 2011 ). In particular, Gaucher et al. (2011) construct a list of job-listing-specific male and female-oriented terms making use of implicit association tests. Table B1 in appendix, section B, shows the resulting dictionaries of terms, w, associated to each gender. The dictionaries are D M and D F for terms that favor men or women respectively, and map terms according to D : W → {0, 1} depending on whether the term appears in the list in table B1 or not. Using these dictionaries, we are able to classify each job advertisement based on a gender score defined as follows: where w runs over all distinct terms in each job advertisement. A job description that gives a positive score is considered to have a male-oriented wording, with the magnitude of the score weighted by the total length of the job description. Importantly, Gaucher et al. (2011) present lab-based evidence that women are less willing to apply to a job if its posting uses male-oriented wording. Graph 13 shows the resulting distribution of the score measured in 1000 of words. The observations for the bottom and top 1 percent of the score have also been removed to increase the readability of the graph. While the score is centered at 0, the graph shows that there is substantial variability across job listings in their gendered orientation. Next, in table 12, we analyze the raw correlations between job listings' gender score and the published gender pay gap indicators. Column 1 shows that there is a positive correlation between firms' mean gendered score and the reported gender pay gap. In other words, firms that employ more male-orientated words for their job advertisements also have a larger gender pay gap. In addition, column 2 shows that a higher gender score is also associated with a lower share of women in the top quartile of the firm wage distribution. Regression analysis. Overall, this descriptive evidence suggests that firms' performance on gender pay gap indicators may be correlated with their hiring strategies, so we now study the causal impact of the pay transparency policy on firms' wage posting decisions and choice of job advertisement wording. In order to do this, we need two additional elements. First, we need a control group, and, second, we need to know the exact firm size to perform the difference-in-difference analysis. To construct the final sample, we use FAME, the UK version of Amadeus, covering all UK-registered firms. For around 30 percent of them, we have information on the number of employees for at least one year in the pre-treatment period -crucial information to implement the difference-in-difference analysis. We then merge FAME with GEO firms using Company House registration numbers. Finally, we restrict the sample to FAME firms with 150 to 249 employees in the years 2014-2017, and merge the FAME firms that are not in GEO with BG directly using the same namematching algorithm for the company name. The final data set contains 1,529,893 observations on 8046 firms. When we further restrict the sample to firms with 200 to 300 employees, which is going to be the main estimation sample, we end up with 91366 observations and 2109 firms. To investigate the effect of the pay transparency policy on firms' wage posting decisions and choice of wording in job listings, we estimate the following difference-in-difference model at the vacancy level: where Y jt is either a dummy equal to one if vacancy i for job listing i of firm j in year t contains wage information, or it represents the gendered score; α j and θ t are, firm and year fixed effects, respectively; and Z ijt contains 2-digit occupation fixed effects and occupation-specific time effects. Finally, standard errors are clustered at the firm level, and we also weight regressions by occupation-employment shares from the LFS. Tables 13 and 14 present the preliminary results of this analysis. The first table refers to wage posting, while the second shows the results for the gender score. In light of the variation we have seen in descriptive analysis (especially by industry) in both tables, we explore potential heterogeneous effects across different industries. In particular, in table 13, we present the results for the entire sample, column 1, and for industries with a low or high gender pay gap in the pre-treatment period. 17 While the coefficient is marginally insignificant, the point estimates in column one suggest that treated firms may have be posting wages more frequently following the introduction of the policy. Interestingly, in the next two columns, we can see that there are indeed heterogeneous results across industries, and firms in industries with a high gender pay gap pre-treatment become more likely to post wages after the pay transparency policy is introduced. As for the gender score, table 14 also points to potential heterogeneous effects across industries. In particular, while in the entire sample it does not seem that the policy has influenced this margin of decision, in column 3 we can see that firms belonging to industries characterized by a high gendered score in the pretreatment period may have decreased the use of male wording following the introduction of the pay transparency policy, though the coefficient is just marginally insignificant. Overall, this preliminary analysis suggests that treated firms may have been able to attract more women in better-paid occupations by changing their hiring strategies. Our next step will be to further investigate this channel, by exploring the composition of the gendered score, and studying the offer of flexible work arrangements. The public disclosure of firms' gender pay gap may induce businesses to tackle gender pay differentials to preserve their reputation. The spike in google searches for the topic "gender pay gap" in correspondence to the deadlines for the publication of gender pay gap indicators suggests that firms are under the scrutiny of the general audience. But what may matter more to them is what investors think. A negative reaction of the stock market to the publication of the gender pay gap figures may constitute a strong incentive for a firm to improve its performance on gender equality. This paragraph aims to measure this response of the stock market, using the traditional event-study methodology (Bell and Machin 2018, Lee and Mas 2012). We focus on the first year of publication as this is when gender pay gap indicators are more likely to represent an information shock for the market. We first combine the list of firms publishing gender pay gap figures in the financial year 2017/18 with FAME to identify both firms that are directly publicly listed on the London Stock Exchange (LSE), and those that have a parent company that is publicly listed. This leads us to identify 926 firms, or around 10 percent of firms publishing gender pay gap figures. Of this group, 101 are directly publicly listed, while the rest has a publicly listed parent company. Importantly, firms can have the same parent company. As a result, we follow 405 distinct publicly listed firms, or 35 percent of all firms listed on the main market of the London Stock Exchange in 2018. Also note that 80 percent of firms belonging to the same group publish on the same date. Hence, in what follows, we consider the publication date of the first that publishes. Extracting daily stock prices from Datastream, we then construct firms' abnormal returns, or AR, as the difference between a stocks actual return and the expected return, where this is estimated using a simple market model: where r jt is firm j stock market return on day t, and r mt is the return of the LSE-all-shares index on day t. Figure 14 shows the 3-day cumulative abnormal returns, CARS(-1, d, 1) in the five days before and after the publication date. While these are not statistically different to zero in the days prior to the publication date, they start to become negative from the publication date up to four days afterwards, with an average loss per day of around 35 basis points. 18 Table 15 further investigates whether this drop may be related to the performance on the gender pay indicators. Column one regresses the 3-days CARs on the day of the publication on a constant, the average gender pay gap reported by firms related to the same publicly listed firm, called "Group-avg GPG" in the table, a dummy equal to one if the gender pay gap is in favor of men, called "Group-avg GPG positive", and an interaction term between these two. Column 2 adds the following controls: a categorical variable for whether the listed firm directly publishes the GPG indicators, or has a subsidiary publishing them; and the number of firms in the group publishing the GPG indicators. Column 3 adds industry fixed effects, and column 4 also controls for the log of market capitalization at t-1, the book-to-market value at t-1 and the return on assets at t-1. While it does not seem that firms publishing a gender pay gap in favor of men are penalized more than others, the main message of this analysis is that firms publishing gender pay gap indicators are under the scrutiny of investors. In turn, this suggests that the reputation motive may have played an important role in explaining the reaction of treated firms. To tackle the persistence of the glass ceiling phenomenon, many governments are promoting pay transparency policies. Exploiting the variation across firm size and over time in the application of the UK transparency policy, this paper shows that this mandate has increased the probability that women work in above-median-wage occupations by 5 percent, an effect driven by newly hired women. While this compositional effect has not yet translated into an increase in women's wages, this may start to materialize as women are promoted to more senior positions. In addition, the UK pay transparency law has led to a 2 percent decrease in male real hourly wages in treated firms relative to control firms. Finally, by combining the difference-in-difference strategy with a text analysis of job listings, we find suggestive evidence that treated firms belonging to industries with a high gender pay gap become more likely to post wage information after the gender pay gap policy was introduced. Our findings have two main implications. First, pay transparency leads to pay compression from above. This conclusion is in line with the findings of other studies on pay transparency. In particular, Mas (2017) finds that pay transparency in the public sector in California leads to a 7 percent reduction in managers' compensation, while both Baker et al. (2019) and Bennedsen et al. (2019) find that disclosing employees' pay by gender leads to a reduction of the gender pay gap through a negative effect on male real wages. It could be that freezing the wage increases of better-paid employees is the most viable option for firms to reduce their gender pay gaps in the short-run. The second implication of our findings is that by making the glass ceiling visible, pay transparency creates cracks in it. The pre-policy 4 percentage-point gender gap in the probability of working in above-median-wage occupations is more than halved with the disclosure of gender pay gap indicators. On top of this, the 2.8 percent negative effect of transparency on men's real wages corresponds to approximately a 15 percent decrease in the in-sample pre-policy gender pay gap. As a comparison, Bertrand et al. (2019) find that female board quotas, another firm policy that has been largely discussed recently, has no impact on the gender pay gap. In other words, pay transparency seems to be more effective than other policies in cracking the glass ceiling. In sum, it is important to stress that our analysis identifies short-term effects, and it is necessary to keep monitoring the effects of this policy in the long run to fully understand its effect on the labor market. In particular, pay transparency may be effective only in the short run, when the novelty of transparency can trigger strong attention from the media, the stock market, and the general public. Yet, as the UK government is currently discussing to extend this policy to the disclosure of ethnicity pay gap indicators, our findings offer robust data-based evidence of the potential impact of such an extension (BEIS 2018). Our next step is to further explore the impact of the pay transparency policy on firms' hiring strategies. In particular, we will study how it affects offers of flexible working arrangements in job postings. Analyzing this dimension of firms' decisions seems especially important in light of the compositional effect that we find on women's occupational distribution. Notes: This table reports the impact of pay transparency on various occupational outcomes, obtained from the estimation of regression 1. The estimation sample comprises individuals working in firms that have between 200 and 300 employees. Panel A presents results for men, Panel B for women. Each column refers to a different outcomes, as specified in the title of the columns. All regressions include firm, year, region, year-region specific fixed effects and individual controls for age and age squared. A treated firm is defined as having at least 250 employees in 2015. The post dummy is equal to one from 2018 onward. All regressions are weighted with LFS weights. Heteroskedasticity-robust standard errors clustered at firm level in parentheses. The pre-treatment mean represents the mean of the outcome variables for the treated group between 2012 and 2017. The p-value at the bottom of the table refers to the t-test on the equality of coefficients for men and women (reported in panel A and B). *** p<0.01, ** p<0.05, * p<0.1. Notes: Columns 1 to 3 report the impact of pay transparency on the probability of working in an occupation above the wage median. Columns 4 to 6 report the impact of pay transparency on log real hourly pay. The estimation sample comprises individuals working in firms that have between 200 and 300 employees. All regressions control for firm and year fixed effects. Columns 1 to 3 also include age and age squared. Columns 4 to 6 also include individual fixed effects. A treated firm is defined as having at least 250 employees in 2015. All regressions are weighted with LFS weights. Heteroskedasticity-robust standard errors clustered at firm level in parentheses. The pre-treatment mean represents the mean of the real hourly pay for the treated group between 2012 and 2017. The p-value at the bottom of the table refers to the t-test on the effect for women in the triple difference-indifference regression (Treated Firm*Post+Treated Firm*Post*Female) . *** p<0.01, ** p<0.05, * p<0.1. Table 9 : Diff-in-Diff vs Diff-in-Disc Above-median-wage Log real hourly pay occupation Diff-in-Diff Diff-in-Disc Diff-in-Diff Diff-in-Disc (1) Notes: This table compares the impact of pay transparency on the main outcomes, when the treatment status is defined using different pre-policy years. The first four columns refer to the outcome "Working in above-median wage occupations", while the last four columns present the results for the outcome log real hourly pay. For each outcome, the column name indicates the year used to define treatment status. Panel A presents results for men, Panel B for women. In all regressions, the estimation sample comprises individuals working in firms that have between 200 and 300 employees. All regressions include firm and year times region fixed effects. Individual controls include age and age squared in columns 1-4, and individual fixed effects in columns 5-8. The post dummy is equal to one from 2018 onward. A treated firm is defined as having at least 250 employees in the year indicated on top of the column. All regressions are weighted with LFS weights. Heteroskedasticity-robust standard errors clustered at firm level in parentheses.The pre-treatment mean represents the mean of the outcome variables for the treated group between 2012 and 2017. The p-value at the bottom of the table refers to the t-test on the equality of coefficients for men and women (reported in panel A and B). *** p<0.01, ** p<0.05, * p<0.1. A treated firm is defined as having at least 250 employees in the pre-treatment period. High-gpg industries are those with a gender pay gap above the across-industry median in the pretreatment period. These include manufacturing, construction, banking and finance, and public administration, education and health sectors. Heteroskedasticity-robust standard errors clustered at firm level in parentheses.The pre-treatment mean represents the mean of the outcome variable for the treated group between 2012 and 2017. *** p<0.01, ** p<0.05, * p<0.1. Notes: This table compares the results of our specifications with those we would obtained if treatment status was based on actual firm size. The first two columns refer to the outcome "Working in above-median wage occupations", while the last two columns present the results for the outcome log real hourly pay. For each outcome, the column name indicates the year used to define treatment status. Panel A presents results for men, Panel B for women. In all regressions, the estimation sample comprises individuals working in firms that have between 200 and 300 employees. All regressions include firm and year times region fixed effects. Individual controls include age and age squared in columns 1-2, and individual fixed effects in columns 3-4. The post dummy is equal to one from 2018 onward. In column 1, a treated firm is defined as having at least 250 employees if it is above this threshold in 2015, while in the second column a firm is treated whenever it has at least 250 employees. All regressions are weighted with LFS weights. Heteroskedasticity-robust standard errors clustered at firm level in parentheses. The pre-treatment mean represents the mean of the outcome variables for the treated group between 2012 and 2017. The p-value at the bottom of the table refers to the t-test on the equality of coefficients for men and women (reported in panel A and B). *** p<0.01, ** p<0.05, * p<0.1. Notes: This table reports the impact of pay transparency on the probability of working in an occupation above the median wage, obtained from the estimation of regression 1. The estimation sample comprises individuals working in firms that have between 200 and 300 employees. Panel A presents results for men, Panel B for women. Each column refers to a different specification, as specified at the top of each column. All regressions include firm, year, region, year-region specific fixed effects and individual controls for age and age squared. The post dummy is equal to one from 2018 onward. A treated firm is defined as having at least 250 employees in 2015. Heteroskedasticity-robust standard errors clustered at firm level in parentheses. The pre-treatment mean represents the mean of the outcome variables for the treated group between 2012 and 2017. *** p<0.01, ** p<0.05, * p<0.1. A-6 All regressions include firm, year, region, year-region specific fixed effects and individual fixed effects. The post dummy is equal to one from 2018 onward. A treated firm is defined as having at least 250 employees in 2015. Heteroskedasticity-robust standard errors clustered at firm level in parentheses. The pre-treatment mean represents the mean of the outcome variables for the treated group between 2012 and 2017. *** p<0.01, ** p<0.05, * p<0.1. Notes: This table reports the impact of pay transparency on the probability of working in an above-median-wage occupation, obtained from the estimation of regression 1. The estimation sample comprises individuals working in firms that have between 200 and 300 employees. Panel A presents results for men, Panel B for women. Each regressions uses different clustering groups for the standard errors as specified at the top of each column. All regressions include firm, year, region, year-region specific fixed effects and individual controls for age and age squared. A treated firm is defined as having at least 250 employees in 2015. The post dummy is equal to one from 2018 onward. All regressions are weighted with LFS weights. *** p<0.01, ** p<0.05, * p<0.1. A-8 year-region specific and individual fixed effects. A treated firm is defined as having at least 250 employees in 2015. The post dummy is equal to one from 2018 onward. All regressions are weighted with LFS weights. *** p<0.01, ** p<0.05, * p<0.1. Does transparency lead to pay compression? Linguistic regularities in continuous space word representations The Effects of Income Transparency on Well-Being: Evidence from a Natural Experiment Gender pay reporting: does it make a difference? Gender bias in the job market: A longitudinal analysis Hours constraints, occupational choice, and gender: Evidence from medical residents Reporting of UK companies gender pay gaps tumbles in pandemic A-1 A-9 B Gendered score A-10 C Burning Glass Techonologies A-11 Due to the large number of job vacancy postings, we used a combination of techniques to match individual job vacancy postings to firm-level data from FAME or the GEO list directly. We first collapsed all firm names in each data set down to a unique set of firm names using standard text cleaning procedures. We identified any exact matches between firm names in postings and our firm-level data set, giving these a match score of unity. We matched the remaining N firm names from the vacancy postings with the universe of official firm names, with M unique entries, using a combination of techniques provided in the scitkit-learn software package. First, the vacancy firm names are expressed as character-level 2-and 3-grams with a maximum of 8,000 features, creating a matrix T with dimensions (number of postings) X (number of features). The 8,000 features define a vector space that we used to express the official firm names in too, with a matrix G.Matching directly with these matrices would require N XM inner products of 8,000 dimensional vectors. Instead, we created a reduced vector space of just 10 dimensions using truncated singular value decomposition on T , creating a reduced dimension matrixT and expressing G asĜ in the reduced space. The vectors representingĜ andT were then sorted into 500 clusters using k-means, providing an associated cluster for each firm name on both sides of the matching problem. For each cluster c i with i ∈ {1, 500} the problem was reduced to finding matches between c i (N ) ≤ N and c i (M ) ≤ M entries -where the equality holds for at most one of the clusters respectively (and rarely holds in practice). Within each cluster, we computed all of the pair-wise cosine similarities between c i (T ) and c i (G); i.e. within a cluster, and with features indexed by f , the matches for T are found by solving arg maxThe score is the cosine similarity of the matched vectors scaled by 0.99 (to distinguish exact matches from exact-in-the-vector-space matches).A-12