key: cord-0655284-93u11b5h authors: Jahn, Najko; Matthias, Lisa; Laakso, Mikael title: Transparency to hybrid open access through publisher-provided metadata: An article-level study of Elsevier date: 2021-02-09 journal: nan DOI: nan sha: dde84b91254c19def4044f91cc6b2b31b4cc276d doc_id: 655284 cord_uid: 93u11b5h With the growth of open access (OA), the financial flows in scholarly journal publishing have become increasingly complex, but comprehensive data and transparency into these flows are still lacking. The opaqueness is especially concerning for hybrid OA, where subscription-based journals publish individual articles as OA if an optional fee is paid. This study addresses the lack of transparency by leveraging Elsevier article metadata and provides the first publisher-level study of hybrid OA uptake and invoicing. Our results show that Elsevier's hybrid OA uptake has grown steadily but slowly from 2015-2019, doubling the number of hybrid OA articles published per year and increasing the share of OA articles in Elsevier's hybrid journals from 2.6% to 3.7% of all articles. Further, we find that most hybrid OA articles were invoiced directly to authors, followed by articles invoiced through agreements with research funders, institutions, or consortia, with only a few funding bodies driving hybrid OA uptake. As such, our findings point to the role of publishing agreements and OA policies in hybrid OA publishing. Our results further demonstrate the value of publisher-provided metadata to improve the transparency in scholarly publishing by linking invoicing data to bibliometrics. standardized and comprehensive publisher-provided data has required several updates to Unpaywall to improve hybrid OA identification and differentiation from delayed OA (Piwowar et al., 2019; Unpaywall, n.d.-b; Unpaywall, n.d.-c) , illustrating ongoing challenges in tracking and comparing hybrid OA prevalence over time. Pinfield et al.'s (2016) analysis of APC payment records provided by 23 United Kingdom (UK) higher education institutions revealed a sharp increase in central payments from 2007-2013, which was largely attributed to the introduction of block grants by Research Councils UK (RCUK) and non-compliance sanctions by the Wellcome Trust. Moreover, the study showed that OA fees were paid almost exclusively through block grants (92%), and only a small number of APCs were paid through internal funding (7%). In contrast, a recent Springer Nature survey found that authors draw on a range of funding sources to cover OA fees, such as dedicated institutional OA funds, block grants, OA agreements, or research grants (Monaghan et al., 2020) . Most hybrid OA authors were supported through dedicated institutional OA funds (43%, excluding block grants) and OA agreements with Springer Nature (34%). The differences between these two studies might be due to different policy and funding arrangements-considering the introduction of OA agreements since Pinfield et al. (2016) and Monaghan et al.'s (2020) more regionally diverse sample. Regional and policy differences in APC payments also came to light in a study by Jahn & Tullney (2016) that analyzed APC records from 30 German higher education and research institutions, the Austrian Science Fund (FWF), Jisc, and the Wellcome Trust. In particular, the study revealed large differences in the amount of hybrid OA funded from 2014-2015. Whilst hybrid OA accounted for less than 1% of APCs paid by German institutions (23 of 3,846), the three non-German research funders recorded a hybrid share of 75% (11,533 of 15,779) . According to Jahn & Tullney (2016) , this could point towards differences in science policy, such as hybrid OA being supported by the three non-German research funders but not by Germany's largest national funder, the German Research Foundation (DFG). Another possibility is that German hybrid OA fees were paid from budgets not reported to the Open APC initiative, a crowd-sourcing effort (Pieper & Broschinski, 2018) from where the authors acquired data. Among these unreported funds are research grants and research unit budgets, which author surveys identified as APC funding sources (Graaf, 2017; Monaghan et al., 2020) . As such, Jahn and Tullney's (2016) findings could reflect the complexities and potential limitations of institutional OA spending data that Pinfield et al. (2016) and Monaghan et al. (2020) attributed to incomplete or missing records. In recent years, national consortia in Europe negotiated publishing agreements covering hybrid OA fees for affiliated authors (Borrego et al., 2020) . While these improved invoicing workflows, internal assessments of hybrid OA uptake and invoicing remain challenging because transparent and comparative data have largely remained absent (Marques & Stone 2020) . Such publicly available information might be scarce because publishers lack or withhold such data, or due to confidentiality clauses (Marques & Stone 2020; Marques et al., 2019; Monaghan et al., 2020) . In this paper, we focus on Elsevier, a prominent example in recent hybrid OA uptake and financial studies (Laakso & Björk 2016; Pinfield et al. 2016 ). Elsevier's OA portfolio presents the challenges in examining hybrid OA described above. For instance, distinguishing between different OA types. Elsevier supports delayed (Elsevier, n.d.-b), hybrid, and full OA, including so-called mirror journals-full OA counterparts of hybrid journals addressing OA policies opposed to hybrid OA (Harrison, 2019) . Further, Elsevier processes APC invoices through various channels, such as agreements with research funders and library consortia or, in the absence thereof, the authors (Elsevier, n.d.-a, n.d.-c) . Surprisingly, we found Elsevier article-level metadata embedded in XML full-texts indicating the articles' OA status and invoicing. Leveraging this publicly available data, we address the lack of transparency around hybrid OA noted in previous studies. In particular, we use this novel approach to answer the following questions: • What was the uptake of Elsevier's hybrid OA publishing option between 2015 and 2019? • Through which channels were hybrid APCs invoiced, and who were the recipients? For this study, we collected data relating to Elsevier's hybrid OA option by drawing on multiple freely available data sources. We identified Elsevier hybrid journals through Elsevier's APC list and supplemented our sample with Crossref metadata and text-mined invoicing data to investigate the invoicing of immediate OA articles provided under a Creative Commons (CC) license in subscription-based journals. Figure 1 visualizes the automated workflow we used to collect data from Elsevier and Crossref. We identified OA articles through CC licenses in Crossref metadata records (Hendricks et al., 2020) and then downloaded the XML version of all CC-licensed articles published in a hybrid journal. From the XML files, we obtained the articles' OA status and invoicing information (see Table 1 ). We determined whether articles were immediate or delayed OA using the XML node openArchiveArticle and measured the uptake of hybrid OA. Moreover, we obtained the invoice channels and recipients of hybrid OA APCs. Using the openaccessSponsorType node, we distinguished between four invoice channels, including invoices billed to authors, as part of publishing agreements with funding bodies (cf. Elsevier (n.d.-a)), exempted through fee waivers (e.g., in "cases of genuine need" or due to society or university sponsorships, cf. Elsevier, n.d.-c), and other types not specified by Elsevier. Finally, when hybrid APCs were invoiced as part of agreements, we identified invoice recipients through the openaccessSponsorName node. We manually classified invoice recipients based on their institutional sectors, countries, and primary research areas. Following the OECD's Frascati Manual (OECD, 2015, p. 91), we coded for four sectors: business enterprise, government, higher education, and private non-profit. Due to the low article volume, we combined the business enterprise sector with invoice recipients Elsevier listed as "authors" and "third-party sponsor" into "Others". Moreover, we categorized invoice recipients according to the countries representing their scope of funding and based on the following primary research areas health sciences, life sciences, physical sciences and mathematics, social sciences and humanities, broad (i.e., multiple research areas), and unknown. Further, we compared Elsevier's invoicing data with institutional spending data from the Open APC initiative. Throughout this mostly automated data gathering and analysis process, we used tools from the Tidyverse (Wickham et al., 2019) for the R programming language (R Core Team, 2020). To allow for efficient data manipulation and retrieval, we imported the Crossref dump to Google BigQuery, applying the rcrossref (Chamberlain et al., 2020) parsers to extract relevant metadata fields. We used crminer (Chamberlain, 2020) to obtain the XML-full texts from Elsevier. This section first presents the results of our analysis of hybrid OA uptake in Elsevier's journal portfolio with a view to licensing, disciplinary differences, and citation impact. Then, we present a descriptive analysis of Elsevier's invoicing data, highlighting licensing and disciplinary differences for invoicing channels and invoice recipients. Overview. article output also grew over time, the relative share of OA articles in hybrid journals only increased slightly from 2.6% to 3.7%. Citation Impact. Furthermore, we compared the journals' field-specific citation impact and their OA uptake. Spearman's rho correlation coefficient was used to assess the relationship between the 2019 Source Normalized Impact per Paper (SNIP) value calculated by Scopus and the journals' OA uptake that year. Considering only journals with at least one OA article, we found a weak but positive correlation between journal impact and OA uptake (r s =0.1679, p < 0.001), suggesting that the relationship is not very strong, based on our data (see Figure 5 ). Hybrid OA Uptake Invoice Channels. As can be seen from Table 4 , hybrid OA APC were most often invoiced to authors (n=41,725; 58.2%) and to a lesser extent as part of agreements (n=24,250; 33.8%). Interestingly, we also found a small number of cases where hybrid APCs were waived (n=4,345; 6.1%). Figure 6 illustrates that over the years, Elsevier has increasingly invoiced authors directly compared to research funders or academic consortia ("Agreement"). The share of fee-waived articles remained relatively stable, but we found different types of waivers. Around 51.7% of fee waivers were linked to a third party. For instance, the French Académie des Sciences, presumably covering OA publication for 853 OA articles in its society journals for affiliated authors. The remaining 48.3% of waived articles did not disclose any invoice recipient. Moreover, Figure 6 compares the invoicing channels based We also observed large differences in OA invoicing among subject fields. Table 5 shows the number of OA articles by subject field and invoice channel. For articles in nursing, decision sciences, and pharmacology, toxicology and pharmaceutics, Elsevier predominantly invoiced authors, whereas most energy and chemical engineering articles were invoiced through agreements. Likewise, the majority of articles in materials science, chemistry and physics and astronomy were not invoiced to authors but facilitated through agreements or waived. The large share of waived APCs in physics and astronomy can be attributed to a single 2015 issue of Nuclear and Particle Physics Proceedings. Figure 8 shows the yearly distribution of centrally invoiced articles by year and institutional sector. While UK-based OA invoices were mainly addressed to discipline-specific governmental and non-profit research funders, invoices to the Netherlands and Norway were issued to national academic consortia representing the higher education sector. In 2019, Elsevier also launched similar agreements in countries with lower publication output including Hungary and Poland. Besides, we found that invoice recipients from the UK and the US mainly represented discipline-specific funders, while invoice recipients from other countries focused on a broad variety of disciplines (see Table 6 ). The table also highlights the proportion of APCs publicly disclosed through the Open APC initiative, showing higher disclosure rates for invoice recipients with a large CC BY share. From 2015-2019, Elsevier recorded growth in the uptake of hybrid OA: The number of hybrid OA articles published per year doubled, the number of hybrid journals with at least one OA article grew by 21%, and the share of hybrid OA articles relative to closed-accessed articles in these journals increased from 2.6% to 3.7%. As Laakso & Björk (2016) , we found only a weak relationship between the journals' SNIP and hybrid OA uptake and observed disciplinary differences. In particular, we found the highest count of hybrid OA articles in physical sciences journals (see Table 3 ). This was followed by the life sciences and health sciences, whereas the social sciences had the lowest count. This order mostly reflects the disciplines' overall publication output. According to the Open Science Monitor (European Commission, 2019), the physical sciences publish the most articles, followed by the health sciences, life sciences, and social sciences. Disciplinary differences in hybrid OA prevalence become more meaningful when considering the relative share of OA articles to closed-access articles. In line with previous research, we found that Elsevier journals from the life sciences and social sciences (Jubb et al., 2017; Kramer & Bosman, 2018; Laakso & Björk, 2016 ) recorded greater than typical hybrid OA uptake, whereas physical sciences journals generally had a lower than typical uptake (Kramer & Bosman, 2018; Laakso & Björk, 2016; Martín-Martín et al., 2018) . In a systematic review of disciplinary OA publishing patterns, Severin et al. (2020) On the other hand, the high hybrid OA uptake that we observed among the social sciences could point to the influential role of OA policies and invoicing agreements (Huang et al., 2020; Larivière & Sugimoto, 2018) . Our invoicing data analysis found that most APCs were invoiced to the author (n=41,725; 58.2%). However, it is important to emphasize although Elsevier's metadata classifies the sponsor type as "author", this does not necessarily mean that authors paid for APCs themselves but rather that APCs were not invoiced through publishing agreements. It is possible that these APCs were covered through institutional funds or research grants. This would align with a recent Springer Nature survey that found hybrid OA was predominantly supported through institutional and funder sources (71%), followed by OA agreements (34%), while only 6% were paid from personal funds or savings (Monaghan et al., 2020) . Further, we observed notable differences in licensing. Most hybrid OA articles invoiced to author were licensed under the more restrictive CC BY-NC-ND license. Previous research, while lacking dedicated studies on license selection, suggests that authors tend to select more restrictive license variants when given a choice (Fraser et al., 2020; Noorden, 2013; Rowley et al., 2017 This study provides a snapshot of hybrid OA for Elsevier, the largest journal publisher, prior to the impact of the implementation of Plan S, an initiative to accelerate the transition to OA that will no longer support hybrid OA (cOAlition S, n.d.). While many Plan S signatories have already had strong OA policies, this harmonized approach is likely to affect publishing decisions of funded authors, the licencing of their articles, and the offerings and pricing of publishers at a larger scale than before. Because the new requirements apply to research funded from 2021 onward, a comparative study on articles invoiced to Plan S signatories would be a fruitful endeavor. The primary aim of this empirical study was to investigate Elsevier's hybrid OA publishing from 2015-2019 to better understand the volume and invoicing of hybrid OA and to present a novel, data-driven approach for such analyses. Our results indicate that although the number of hybrid OA articles has increased over time, its uptake has remained low. Notably, hybrid APCs were most often invoiced directly to the authors, followed by agreements, where only a few funding bodies were the primary drivers of hybrid OA. Finally, our findings highlight that publisher-provided metadata about the invoicing channels of (hybrid) OA can facilitate research into and increase the understanding of the financial flows of OA publishing. Since the beginning, hybrid OA has been a challenging subject to study due to the lack of standardized ways publishers flag such content and APC funding data being limited self-reported data, surveys, and other secondary sources. This study presented a novel approach to studying APC invoicing that is based on publicly available publisher-provided metadata, which can be used on on its own or in combination with other public data sources to gain more detailed and comprehensive insights into hybrid OA uptake and invoicing. If more publishers reported OA invoicing on the article level and in a machine-readable format, this would increase transparency and improve monitoring of the scholarly journal landscape over time. As hybrid OA has become a central element of OA policies of research funders and libraries, consumer organizations could require that invoicing information is added to the article-level metadata. As long as publishers do not provide this data in a structured and comprehensive format, they prevent benchmarking prices and therefore hinder competition. From recent science policy developments in Europe it appears that Big Deals have gained support and remain firmly in place in the form of transformative agreements. Through this study we can affirm that hybrid OA is complex as the financial flows involve research funders, libraries, consortia, and authors. However, it is on publishers to increase the transparency of OA publishing, including hybrid OA and transformative Paying for open access: The author's perspective What are mirror journals, and can they offer a new world of open access? Crossref: The sustainable source of community-owned scholarly metadata Transformative agreements: A primer Evaluating the impact of open access policies on research institutions. eLife, 9 A study of institutional spending on open access publication fees in germany. PeerJ, 4, e2323 Monitoring the transition to open access: A report for the universities uk open access co-ordination group A study of open access publishing by NHMRC grant recipients Towards a Plan S gap analysis: open access potential across disciplines using Web of Science and DOAJ [Data set Hybrid open access-a longitudinal study The oligopoly of academic publishers in the digital era Do authors comply when funders enforce open access to research? Total cost of ownership" of scholarly communication: Managing subscription and APC payments together Opening the black box of scholarly communication funding: A public data infrastructure for financial flows in academic publishing Transitioning to open access: An evaluation of the UK Springer Compact Agreement Pilot Monitoring agreements with open access elements: Why article-level metadata are important Evidence of open access of scientific publications in google scholar: A large-scale analysis Publisher OA portfolios 2.0 (Version 2.0) [Data set The two-way street of open access journal publishing: Flip it and reverse it Double dipping in hybrid open access -chimera or reality? APCs in the wild': Could increased monitoring and consolidation of funding accelerate the transition to open access? figshare Citations, mandates, and money: Author motivations to publish in chemistry hybrid open access journals Researchers opt to limit uses of open-access publications Frascati manual 2015: Guidelines for collecting and reporting data on research and experimental development OpenAPC: A contribution to a transparent and reproducible monitoring of fee-based open access publishing across institutions and nations The "total cost of publication" in a hybrid open-access environment: Institutional approaches to funding journal article-processing charges in combination with subscriptions The state of OA: A large-scale analysis of the prevalence and impact of open access articles The future of OA: A large-scale analysis projecting open access publication and readership Open access at the national level: A comprehensive analysis of publications by finnish researchers The costs of double dipping From here to there: A proposed mechanism for transforming journals from closed to open access R: A language and environment for statistical computing. R Foundation for Statistical Computing Open access uptake by universities worldwide Academics' behaviors and attitudes towards open access publishing in scholarly journals Disrupting the subscription journals' business model for the necessary large-scale transformation to open access. Max Planck Digital Library Discipline-specific open access publishing practices and barriers to change: An evidence-based review Equity for open-access journal publishing Open access to research publications UKRI open access block grant What does is_paratext mean in the API? What do the types of oa_status (green, gold, hybrid, and bronze) mean? What is an OA license? Research Councils UK open access funding 2019-2020. Library & Archives Service at The London School of Hygiene & Tropical Medicine Welcome to the tidyverse