key: cord-0973141-jg8nqflm authors: Woods, Helen Buckley; Pinfield, Stephen title: Incentivising research data sharing: a scoping review date: 2021-12-21 journal: Wellcome Open Res DOI: 10.12688/wellcomeopenres.17286.1 sha: 1c5517658be57b744c55500469b23488e3c2f05f doc_id: 973141 cord_uid: jg8nqflm Background: Numerous mechanisms exist to incentivise researchers to share their data. This scoping review aims to identify and summarise evidence of the efficacy of different interventions to promote open data practices and provide an overview of current research. Methods: This scoping review is based on data identified from Web of Science and LISTA, limited from 2016 to 2021. A total of 1128 papers were screened, with 38 items being included. Items were selected if they focused on designing or evaluating an intervention or presenting an initiative to incentivise sharing. Items comprised a mixture of research papers, opinion pieces and descriptive articles. Results: Seven major themes in the literature were identified: publisher/journal data sharing policies, metrics, software solutions, research data sharing agreements in general, open science ‘badges’, funder mandates, and initiatives. Conclusions: A number of key messages for data sharing include: the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicising and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories. The past decade has seen intensified focus on the importance of openness and transparency in research processes. Broadly characterised as 'open science' or 'open research', there are now multiple initiatives and funder/institutional policies which aim to strengthen research reproducibility, access, and utilisation through more open approaches. Important parts of this landscape include the introduction of open access business models by publishers, the creation of open infrastructure (including networks of repositories), and development of policies supporting openness. In the area of data sharing, key initiatives include the release in 2015 of the Transparency and Openness Promotion Guidelines (TOP guidelines) produced by the Center for Open Science, and the launch in 2016 of the FAIR Principles (Findability, Accessibility, Interoperability, Reusability). Open data practices are also a part of the EU's open science policy platform, for example in examining open data readiness in Europe (Nagy-Rothengass, 2016) . Numerous initiatives exist to incentivise researchers to share their outputs, in the form of rewards and benefits for doing so, such as various credit and recognition schemes, or conversely, sanctions for non-compliance, for example delaying payment of a grant until compliance with a data sharing policy has been met. In efforts to increase open access to research publications, approaches involving robust statements of requirements, ongoing compliance monitoring, and sanctions for non-compliance have achieved success (Pinfield et al., 2014) , with high levels of compliance realised with the open access policies of the Wellcome Trust and National Institutes of Health (NIH), and less compliance found with funders' policies that have weaker or no sanctions for non-compliance (Laviere & Sugimoto, 2018) . When mandates and punitive policies appear to have such success in accelerating open access to research, this raises the question of whether a variety of similar incentives are needed to encourage open data practices? However, there are additional complications in the area of data sharing. The effect of discipline and field may be greater in data sharing than in other open research practices (Resnik et al., 2019) . In addition, the deep complexity of the research system often masks the reasons why particular interventions work. Simple incentives are found to work in one discipline but not another or be unnecessary in one field and not stringent enough in the next. The definition of success is also more complicated with open data interventions than with open access publishing, as the data needs to be more than accessible, it needs to comply with other elements of FAIR, and where doing so may be a matter of degrees rather than absolutes (see Hardwick et al., 2018, this article) . Success in data sharing also depends upon alignment of the incentives and activities of multiple actors in the research system, so that practices of researchers are aligned with, for example, journal publishers' requirements, and also in line with a funders' policies, as reflected in the work of the National Academies of Sciences, Engineering, and Medicine (2020) , to advance open science practices. Moreover, for open data sharing to be successful, it must be a truly multi-professional endeavour, with librarians, data scientists, software developers and many other professions' expertise needed to create spaces for different types of data to be curated, shared, discoverable and reusable in an ethical and timely way (Pasek, 2017) . The aim of this review is to identify and summarise evidence for the efficacy of known interventions and credit mechanisms to promote open data practices, to provide an overview of current research in this field. It was carried out in support of Wellcome's role in the coronavirus disease 2019 Therapeutics Accelerator (CTA), although was designed to have wider application. This review makes a particular contribution to this vast area of activity by focussing on current research describing or evaluating researcher incentives as published in the scholarly literature. In view of recent comprehensive in-depth literature reviews on open research -Jubb (2016), and Zuiderwijk et al. (2020) -the material in this review is limited to the last five years of publications. A scoping review was chosen as most appropriate approach for this project. It can be defined as a 'preliminary assessment of potential size and scope of available research literature. [It] aims to identify [the] nature and extent of research evidence (usually including ongoing research)' (Grant & Booth, 2009, p.95 Other methodological points to note were: • As the review was initially created in support of Wellcome's role in the COVID-19 Therapeutics Accelerator (CTA) a protocol was not created. • Quality assessment of evidence was not included in the review as it was completed in a short timeframe, in line with scoping review norms. A search of two key research databases was undertaken: Web of Science and LISTA (Library, Information Science and Technology Abstracts) to retrieve material to meet these requirements. No date, study or language limits were applied in the information retrieval process. In the selection process, material was limited to 2016 to 2021 publications. Initial searches took place in 2020 and were updated in June 2021. In order to complete the review in a short timeframe a pragmatic approach to the discovery of relevant materials was chosen, in order to limit the number of false positives retrieved. Terms for data (such as 'open data' or 'data sharing') were combined with terms for research actors (such as scientist* or publisher) or terms for relevant activity (such as reproducibility or reuse). These terms were tested against references from a current review (Tenopir et al., 2020) to establish if it could retrieve its references. Terms for incentives were not used, with broader words and phrases being more effective. Interventions and other topics of interest were identified in the screening process within this pool of papers. An example search strategy is given in Box 1. Web of Science Core Collection # 5 (TI=(("Open access") and (data))) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI Timespan=All years # 4 (TI=(("Research data") and (managing or sharing))) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI Timespan=All years # 3 (TI=(("Data sharing") and (publisher* or author* or publication* or funder*))) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI Timespan=All years # 2 (TI=(("Data sharing" or "data-sharing" or "data-reuse" or "data reuse" or "data management" or "data-management" or "opendata" or "open data" or "data standards" or "data-standards" or "data-standard" or "data standard" or "data availability" or "dataavailability") and (efficiency or reliability or reproducibility))) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI Timespan=All years # 1 (TI=(("Data sharing" or "data-sharing" or "data-reuse" or "data reuse" or "data use" or "data-use" "data management" or "datamanagement" or "open-data" or "open data" or "data standards" or "data-standards" or "data-standard" or "data standard or "data availability" or "data-availability") and (science or scientist* or scientific or research* or academic*))) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI, CCR-EXPANDED, IC Timespan=All years Results were transferred to Endnote (X9.3.2) where duplicates were removed. A total of 1128 results were then transferred to MS Excel (16.54) where they were screened for relevance and selected if they focused on designing or evaluating an intervention or presenting an initiative to incentivise sharing. General papers advocating the need for data sharing but without discussing specific interventions were excluded. Additional papers have been included in the references section for readers' interest. Please see the results section for more details. Papers in the 'incentive' set were further coded to record disciplinary area, context of intervention (such as funder / publisher / generic), and type of article. Due to the small number of items (38) in this set, it was not feasible to display correlations between these categories graphically, so this information is included in a narrative description. The results of the review are presented below in a narrative commentary, beginning with an overview, followed by summary of different categories of incentive identified. This is followed by a summary of incentives and their outcome, before the report is concluded with a discussion of the principal messages from successful data sharing interventions. A summary table of results with key data extracted is available as extended data (Woods & Pinfield, 2021) . Please see the data availability section for access. This allows an overview of the main features of each document in one table. A full reference list of papers cited in this report, including those in the results set is presented at the end of the document. There are 38 items in the results set, comprising 25 research papers and 13 opinion pieces/editorials. The majority of items (20) are from scientific or medical fields, with the remaining items found within social sciences publications. None were identified from the arts and humanities. The types of interventions were varied but can be roughly classified into seven groups: publisher/journal data sharing policies, metrics, software solutions, research data sharing agreements, open science 'badges', funder mandates, and other initiatives. Papers concerned with academic publishing are the largest group comprising 14 papers, the next largest group being metrics with seven papers. Other categories contain ≤ five items. See Figure 1 for a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram giving details of the search process. Additional papers were selected and added to the references section for readers' interest. These are surveys of researchers' views linked to a related data sets presenting a correlation between views and observed behaviours: Goldstein ( availability statements within papers published in PLOS One, only 20% indicated compliance with the requirement to deposit data in a repository. These papers highlight the variation in compliance with policies across disciplines and fields and the variance between the formulation of a high-level policy at publisher level and its enaction at journal level. In the field/discipline group, Spicer & Steinbeck (2018) investigates the field of metabolomics, and finds that a higher prevalence of published data was not correlated in journals with an open data policy. Investigating the research data sharing policies of highly cited journals in the fields of neuroscience, physics and operations research, Rousi & Laakso (2020) found a large variance in the existence, strength and content of data policies across research fields. The author highlights the need to have policies which are tailored for specific fields, for example, the treatment of particular data types and to capitalise on the existing practices of a discipline, such as the use of a repository endorsed by a research community. Vasilevsky et al. (2017) and Wiley (2018) both state the same aims, to investigate the 'pervasiveness and quality of data sharing policies' within their fields -biomedicine and engineering respectively. In a review of 318 journals' author's instructions and editorial policies, Vasilevsky et al. (2017) found that only a minority of journals (11.9%) require data sharing as a pre-condition for publication. A significant number (65%) of journals with a data sharing policy specifically made reference to reproducibility, but very few journals explicitly gave guidance on how best to make research data accessible and reusable. Wiley (2018) also analysed a sample of instructions to authors and data sharing policies, in engineering journals. Of the 28 journals analysed, the author classified 21 as 'weak', four as 'strong', with four making no reference to open data. They found no correlation between open access journals and data sharing. They also found that journals with high impact factors are not more likely to have an open data policy. Thelwall & Kousha (2017) focus on two evolutionary biology journals that have data sharing mandates and make widespread use of a repository. They found that the data mandates were completely successful in some journals, concluding that as the major journals in the field have operated at this level of compliance since 2012, the field had transitioned into a position where data sharing had become a mainstream activity. Kim et al. (2020) , describe the data sharing policies of journals in life, health, and physical sciences through a sample of 700 journals indexed in the 2017 edition of Web of Science's Journal Citation Reports. The authors selected the top journals in each quartile from the 178 categories. The policies were categorised (absent, strong, weak), and the characteristics of each journal was recorded (such as geographical location of publisher, impact factor and discipline). Regression analyses and modelling were conducted to determine whether there was a relationship between journal characteristics and the strength of the data sharing policy. Within the sample, 44% had no data sharing policy, 17.9% had weak data sharing policies, and 38.1% had strong data sharing policies (expecting or mandating data sharing). The authors report an association between certain characteristics and the strength of data sharing policies. Journals from non-commercial publishers were more likely to have no data sharing policy than those from commercial publishers. Health science journals were more likely to have no data sharing policy than life sciences journals subject area. Journals from European publishers were more likely to have a strong policy than those from North American publishers, which the authors suggest may be due to the influence of the numerous national open science initiatives in Europe. The authors conclude that these characteristics are significant factors in influencing journals' data sharing policies. They suggest future research which takes a more nuanced approach to grading policies success, as a 'strong' policy requiring a data availability statement does not ultimately mean that data is shared. This sub-theme presents a mixed picture regarding journal data policies. Authors reported the complete absence of policies, and variance in compliance where they exist. There is also variance in how authors define the strength or success of policies. In some fields, data is published regardless of the absence or presence of a policy. There appears to be a need for more detailed guidance on particular aspects of open data practices, such as how to prepare data for sharing, and how to best ensure reuse and the reproducibility of research using deposited data sets. However, where journal data sharing mandates are in place, there is evidence of widespread compliance amongst authors. (2019) journal has an 'expects data' policy with authors expected to provide a statement on data availability. Marks (2020) policy states a requirement for authors to make raw data fully available and accessible. In contrast, Levenesque (2017) from the Journal of Youth and Adolescence provides the journal's response to the publisher's mandate seeking to find a balance between the benefits and costs of data sharing for authors who work with a 'wide variety of data'. He seeks to protect authors from the 'potential harms that can come from editors' unilateral mandates'. In an editorial for the Journal of the Association of Nurses in AIDS Care, Relf & Overstreet (2021) also present open data requirements for authors, including the pre-registration of clinical trials and systematic reviews. [2014] [2015] [2016] [2017] . They found that the policy increased the incidence of research data being shared, and that appeared reusable. However, there were still articles without available data and with data that was not reusable when investigated. The authors point to errors such as missing values or typos or the lack of an analysis script detailing the code used to run the analyses. These papers present different disciplinary perspectives on data sharing providing an insight into the ethical challenges that accompany data sharing, particularly in some social science and humanities research. This theme offers a variety of publisher interventions at publisher, discipline, and individual journal level. There appears to be great variance in the existence of policies and in compliance with them. There is a need for more detailed guidance for authors on how to prepare their data, tailored to the discipline or field, to increase concordance with open data policies and successful re-use of research data. Overall, five papers propose or evaluate incentives associated with metrics. Bierer et al. (2017) suggest that 'data authors' should be recognised category of authorship, so people are credited through citation. In a response to this proposal, Sydes & Ashby (2017) raises the issue of accrediting work on a clinical trial and proposes the creation of a contributor database and the use of standardised terminology for people's roles using CRediT (Contributor Roles Taxonomy). Olfson et al. (2017) writing in the field of clinical medicine, also propose the development of a 'S-Index' (sharing index) to measure data sharing and use. For each researcher who shared data '…publications using their shared data would be ranked in descending order by number of citations and the value of their S-index would be the number of papers (N) in this list with N or more citations' (p. 5). The authors propose this would allow data sharing to be measured appropriately and therefore included in career progression and other activities. A call for strong public funding commitment is needed to realise this goal. Devriendt et al. (2020) also propose data level metrics to credit authors for each reuse, such as downloading, data citations and so on. The assumption behind such arguments is that if people are publicly credited for their work in producing and sharing data and can therefore accrue esteem within their community for their contribution, they will be more likely to make their data open. Mongeon et al. (2017) present a preliminary method to link data set creators to published authors in Web of Science in order to understand data sharing practices and contributions across disciplines. All records from Data Cite in 2015 were downloaded, these were matched with all publications identified from Web of Science in 2013-2015 using the authors names from both data sets. A large number of data set authors could be linked to authors of publications in WoS. The motivation behind the study was to gather information as a contribution toward the process of developing appropriate metrics for crediting data sharing. From the results of the study, the authors stress the importance of disciplinary differences when developing metrics for data sharing. The results found that data sharing is common in biomedical research, chemistry, medicine and biology, less so in social sciences and rare in arts and humanities. It is also not possible to share data in some fields, for example where research explores sensitive topics or uses commercial material. The authors suggest 'any assessment of the level of data sharing must take into account what could (or should) have been shared, rather than the raw output.' (p. 552) Kwon & Motohashi (2020) examine the incentive of increased citation of publications that have associated with shared data. The analysed over 310,000 articles indexed in Web of Science in 2010 and comparted the number of citations of articles that shared data with those that didn't. They found for those articles where data was shared, citations increased in the short term but decreased over time. The authors suggest two competing factors that would affect researchers' motivation to share their data, firstly the increased visibility of research due to data posting, but also the increased competition in the research community resulting from data sharing. Additional analysis found that the balance of these two factors changes depending upon the place of publication. In more prestigious journals the competition factor is weakened, in less prestigious journals the visibility factor is weakened using citation count data from Web of Science. Christensen et al. (2019) also investigated the effect of data sharing on an article's citations. Publications in 17 high-impact journals that introduced a data sharing policy were analysed, pre and post the introduction of the policy, in a natural experiment. Where authors shared data, an increase in citations was found, but this may be linked to other factors, such as different authors or types of articles being published post policy change. The authors found no conclusive evidence that there is a link between data sharing and increased citations, but it may be one of a number of factors that led to higher citations of publications. There was no evidence as to why data sharing may increase citation rates. However, it may be one motivating factor for researchers to share their data, either in compliance with journal mandates, or as an independent practice. A variety of interventions are proposed or evaluated in this theme, focussed on establishing mechanisms to credit authorship of research data and reward data sharers. The remaining categories contain fewer papers, and so summaries of the interventions below are briefer. In total, four initiatives were selected for inclusion. . It allows data to be generated and accessed freely by a core group of research collaborators, via a data library, with more restricted access for other institutions and the public. Data is shared in real time in both raw and processed formats. Rowhani-Farid et al. (2020) report on a randomised controlled trial to assess the effectiveness of awarding badges for data sharing in BMJ Open. They report that the intervention did not motivate researchers to share data and data sharing rate was low in the control and intervention group. This is in contrast to the work of the Center for Open Science (2021) et al., 2018) . Numerical values were reproducible without author involvement for nine articles, reproducible with author involvement for six, not fully reproducible with no author response for three, not fully reproducible with author involvement for seven articles. Unclear reporting of analytical methods is cited as the main barrier to reproducibility. The authors conclude, (reinforcing their previous findings) that the availability of data alone is not sufficient to ensure reproducibility of results. Prado & Baranauskas (2016) looks at the effects of data sharing software using actor network theory suggesting that this provides a shared point of contact for numerous actors in the system and has potential to improve data sharing through better collaboration. Prieto et al. (2017) Finally, also in the field of clinical trials, Gaba et al. (2020) assess the compliance of funded randomised controlled trials (RCTs) with data-sharing policies of commercial and noncommercial funders in the years 2016-2018. Under half of those funders surveyed had a data sharing policy, with a subset of the policies mandating data sharing. Two random samples of 100 RCTs registered on Clinicaltrial.gov funded by those with a data-sharing policy found good coverage of data-sharing statements (77 non-commercially funded; 81 commercially funded RCTs), with an intention to share data made in a small number of trials (12% non-commercial, 59% commercial). The authors suggest as a first step towards greater consistency in data sharing practices across RCT, a collated, comprehensive and updated list of funders' policies should be created in order to work towards standardisation of such policies. A lack of incentives for researchers to comply with policies could also limit their success. Mueller-Langer & Andreoli-Versbach (2018) reports on the unintended negative consequences of data sharing agreements, such as researchers delaying sharing their data in order to fully exploit its potential in their own continuing research before publishing it. Pasek (2017) located at the University of Wyoming, describes an evaluation of government data sharing policies for US government research grants. The policies have limited success, but through this evaluation a tailored research data management service is being developed to fill the gaps in the policy guidance. Finally, Polanin & Terzian (2019) report on a randomised controlled trial investigating the effects of data sharing agreements on researchers' willingness to share individual participant data. This study focussed on primary study authors whose studies were included in meta-analysis in the social sciences. Through searches of bibliographic databases 1,207 authors were invited to participate in the study, with 580 (48.1%) allocated randomly to the intervention group (where participants received a hypothetical data-sharing agreement), and 627 (51.9%) to the control group (where participants did not receive the data-sharing agreement). Confounding factors were controlled for using numerous measures. The study found that participants who received the data-sharing agreement were more willing to share their data set (24% more likely) compared to those in the non-intervention group. See Table 1 for a summary list of interventions from the included study. Limitations of the scoping review process Scoping reviews are designed to provide a quick response to identify the ideas or interventions that have been published on a particular topic. As Tricco et al. (2016) suggest, this type of Pasek (2017) Government data sharing policies for US government research grants Limited success, a tailored service is being developed to complement the guidance given in government policy for research data management. review is limited in its very nature, as it aims to provide breadth rather than depth of information. As in the case of this report, scoping reviews are often initiated as part of a wider project, to inform primary research or identify gaps in the literature. As highlighted by Grant & Booth (2009) when read in isolation, prudence should be exercised in the interpretation of the findings as quality assessment methods are not usually applied in a scoping review, as is the case in this review. Of the 38 interventions listed above, 10 reported some degree of success. The key messages from these papers are presented below. Chan et al. (2021) found success in a pilot researcher-led initiative to share large data sets within a COVID-19 research collaboration in cell biology (COMET) based on building a "data sharing trust" amongst actors. The key factors for success were: • an existing institutional data sharing platform was used • a data sharing agreement was put in place for the project • the COMET project executive committee monitored the pilot and intervened where necessary to resolve problems. The data sharing agreement allowed for all researchers to see the data, but permission had to be gained from the owner of the data to reuse it. In addition, authorship was offered to the team / lead investigators who generated the data initially. The agreement provided protection against being 'scooped' and rewarded data generation and sharing. Another notable success factor was that the Comet project executive committee monitored the pilot and intervened where necessary to resolve problems. For example, they assigned additional personnel for project and data management to streamline the data sharing process and resolved conflicts where different groups began working on similar or overlapping ideas. Polanin & Terzian (2019) found evidence to support their hypothesis that a data-sharing agreement affects authors attitudes and willingness to share individual participant data to be included in meta-analyses. Authors concerns can also be addressed in advance through a data-sharing agreement, increasing the success of this intervention. Authors primary concerns identified in the study were: the need for adequate storage and accessibility of data; the limits of reuse once shared; the time taken to prepare the data for sharing, and the right to contribute to the meta-analysis that their data would be included within. The key message is that when seeking data from primary study authors, meta-analysts should send a data-sharing agreement, which addresses authors key concerns, in addition to an email asking for the data set. Kwon & Motohashi (2020) highlight the need to address two factors when creating a data sharing policy: to harness the benefits of increased citations as a motivator for researchers to share data, and to mitigate the deleterious effects of this practice, namely increased competition. They make two recommendations, firstly increased legal protection for the owners of research data, enabling researchers greater control of who accesses and uses their data, possibly using a licensing scheme. This may be too complicated to realise in practice, but if practicable would address one significant disincentive to data sharing. The second recommendation is to mandate that all researchers disclose their data, possibly as a condition of receiving public funds. The authors concede that this policy may also result in researchers undermining this measure by not curating their data appropriately for sharing. Thelwall & Kousha (2017) found that data sharing mandates were highly successful in evolutionary biology journals that had signed up to a 'Joint Data Archiving Policy' (JDAP) datadryad.org/pages/. These mandates have been in place since 2012 and data sharing has become a mainstream activity. The reason for success is not stated explicitly, but the effectiveness of the policy may lie in its joined-up approach across a field with several journals signing up to the policy. The data was held at an existing digital repository (Dryad) designed and used for evolutionary biology research data, so linking to this existing resource meant more chance of success. It had already proved to be fit for purpose for this particular type of data, and people were already using it, so new habits did not have to be formed and there was no additional time to spend learning how to use new software to deposit the data. It was also set up so that authors received automated instructions on how to submit their data to the repository from the journal they would publish in. This project (Hickson et al., 2016) aimed to improve adherence to the use of data management processes by researchers in an Australian University. To plan a successful behaviour change strategy, they surveyed researchers and interpreted the data based on the ACOMB behaviour change model. With Attitude (A) influencing C. Capability; O. Opportunity; and M. Motivation, all of which interact to generate behaviour (B). as it is the main barrier to good data management practices. To this end, interventions were designed to meet individual's capabilities and needs to affect attitudes and promote the use of safe and secure institutional data management services. Pasek (2017) examined government data sharing policies for US government research grants, focusing on the data sharing policy of the National Science Foundation. The author states several shortfalls in the policy including undefined terms, ambiguous definitions, with minimal guidance and examples of data management plans provided for users. The author suggests that librarians are best placed to bridge the gap between the policy and its implementation by researchers by supporting grantees to practically create data management plans (DMPs), provide technical support to curate and share data, and provide expertise in metadata and data management standards, as well as expert knowledge of open data and open access initiatives. In an intervention to introduce data management and sharing requirements for award holders from a funding organisation, Neylon (2017) stresses the importance of changing research culture, not just researcher behaviour. This finding points to the adoption of longer-term policy goals and five recommendations are given for policy formulation: • The two functions of a policy: first, the message that something is an important issue (such as data management), and second, the steps to change people's behaviour, need to be in concert and mutually reinforcing. • The message feature of a policy is important and will work even better if it goes with the grain of existing feelings or thinking on a topic. The policy needs to make sense to those within the funding organisation, inspiring individuals, cohering with, and enlivening the organisation's culture and values. It should empower people to act and provide the necessary resources and infrastructure for those managing its implementation. • Staff time (particularly Program Officers) and resources need to be made available to ensure that the grantees are supported to understand and adhere to the policy and that adherence is monitored and followed up. Otherwise, the message to researchers is that data management is not important after all. • Staff time and resources need to be a long-term commitment to ensure policy success. Through continued practical commitment to implementation, adoption of the policies will be more widespread with greater numbers of people changing their behaviour, to eventually become mainstream behaviour. Grantees who engage with these new ways of working become part of a community driving best practice, and through the visibility of their actions encourage others to join them. This advocacy role rewards researchers, creates more visibility and publicity for these practices (and underlying policy) and creates a virtuous circle, attracting more people to join in. Hardwicke et al., 2018 found that the rate of data being shared in the journal they examined (Cognition) had increased through an open data policy. An interrupted time-series analysis found data availability statements had increased from 25% to 78% after the policy had been introduced. They found that the amount of data that was reusable moved from 22% to 62% after the policy was introduced. The reason why the policy worked is not explored but whether it worked beyond face value. The authors conducted several exploratory analyses accessing and repeating research methods described in Cognition articles. Author's suggestions resulting from their analysis: • Policies need to be consistently enforced to ensure data is available and reusable. • Offer clear guidelines for authors on data management including checklists to ensure procedures are followed. • Assign a specific member of an editorial team to oversee data assessment and policy compliance. • Journals need to provide clearer labelling of additional files, to describe exactly what they contain. Avoid bland, time-wasting titles like 'supplementary data'. • More consistent licensing is needed, so that it is clear if a data set can be reused and there is no uncertainty about this. • Use of repositories instead of journals own 'supplementary materials sections' to avoid broken links to information. Repositories future proof materials by creating read only, time stamped files that have DOIs and can therefore be cited easily. Plomp et al. (2019) reports on a data management service tailored to disciplinary areas within Delft University of Technology. The authors advocate pursuing interventions in data management, even though success will be limited due to systemic problems within the academic reward system. The key findings on delivering a successful intervention are to work with individual disciplinary communities and have a dedicated member of staff (a 'data steward') who has expertise in data management within the subject, which includes first-hand knowledge of conducting research in a relevant subject area through a doctoral qualification. The role of academic data 'champions' was also highlighted, academic staff who model good practice and advise peers on data management. In conclusion, data stewards drive cultural change enabled by a suitable technical infrastructure, and their understanding of existing cultural norms and ways of working in different disciplinary areas. In summary, the key 'take home' points from the studies are: • The need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them, • The importance of publicising and explaining the policy/service widely, • The need to have disciplinary data champions to model good practice and drive cultural change, • The requirement to resource interventions properly, • The imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories. Whilst these studies all focus on particular contexts and actor groups, it is reasonable to assume that many of the insights they gain are transferable to other situations, although the extent of transferability will vary depending on a complex set of factors. This scoping review of incentives and credit mechanisms for open data sharing is based on data identified from Web of Science and LISTA, limited from 2016 to 2021. A total of 1128 papers were screened, with 38 items being included. These items comprised a mixture of research papers, opinion pieces and descriptive articles. The material was categorised into seven groups according to intervention: publisher/journal data sharing policies, metrics, software solutions, research data sharing agreements in general, open science 'badges', funder mandates, and initiatives. The material in this review does not reveal any new types of incentive or credit mechanism, nor do we claim to have identified any panaceas. However, the evidence that is included is taken from many different contexts, disciplines and perspectives, and illustrates a range of activities and experiments. As such, this set of material reflects the complexity of the open data movement and the different success levels and approaches to open data sharing that exist across the disciplines. With numerous incentives being trialled within individual sectors of the research system, it seems that the cutting edge of the movement is now investigating aligned incentives as the most beneficial way forward (National Academies of Sciences, Engineering and Medicine, 2020). The evidence in this review also suggests (in line with previous evidence), that tailored incentives, bespoke to particular disciplines and fields, that harness existing working practices, and are appropriately resourced are more likely to be successful. Underlying data All data underlying the results are available as part of the article and no additional source data are required. Göttingen State and University Library, University of Göttingen, Göttingen, Germany The review provides a comprehensive overview of mechanisms and evidence of their effectiveness for incentivizing researchers to share their data. The motivation and the objectives of the review are clearly stated. However, it was a bit surprising that the authors seem to consider policies as such as incentives for researchers to share data, without further discussing their definition and scope of incentives in the introduction. In an earlier review, Rownhani-Farid et al. (2017) 1 have decided to exclude policies as studies had observed that there was only low uptake in the considered field and a lack of rewards to researchers for sharing their data. The study design and the overall search and selection strategy are well described. The analysis is very valuable as it covers a broad range of interventions and provides sufficient detail on the results and effectiveness from the studies and other materials included in the review (further details are provided in an openly available dataset). However, reproduction or replication would be impossible as the authors do not specify their search strategy in detail. Primarily, they provide search string examples (Box 1, p. 4), without stating which string was finally used for their initial search and the updated search (Figure 1 , p. 5). In addition, it remains somewhat unclear if the retrieval was generally restricted to the title of the work, or targeted the topic (i.e. covered the title, abstract and keywords). Regarding the search strings used, it would have been good to explain why the truncation publisher* and not publish* was used (why is the focus on publisher and not that data is published?). Other comments: Although research performing organisations are to some degree covered, it remains unclear why they are not mentioned in the Methods section (e.g. p. 3). ○ Consider archiving the dataset in an open format (e.g. csv), together with a README file with basic documentation. The first column/variable contains more than one value (i.e. the data does not follow the tidy data principles), and one row uses a reference that should be made explicit ("the above article"). The conclusions drawn by the authors are well supported by the results presented in the review. However, some statements describing the overall context of the review in the introduction would benefit from references, e.g. Consider to add further references to statements in the introduction (p. 3, "Numerous initiatives exist to incentivize researchers…", "delaying payment of a grant until compliance with a data sharing policy has been met"). Other statements seem to anticipate findings of the review (p. 3, "Simple incentives seem to work in one discipline but not another"). Complete the missing reference to Larivière & Sugimoto (2018) , 2 it is provided in the text but not linked and listed in the references. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Centre for Culture and Technology, Curtin University, Perth, WA, Australia This report provides a scoping review of approaches to encourage research data sharing. It provides an initial literature list of 38 core items that examine or introduce interventions, approaches and evaluations for data sharing programs. It provides a broad thematic analysis and an initial synthesis of the literature identified. It also provides some level of taxonomy of interventions that would be interesting to see developed further in the future. One of the issues with an approach like this is that it is necessarily scoped down on specific databases. Nonetheless, one piece of work I think would enrich this analysis is the strand of work from Cooper and Springer of Ithaka S+R 1,2 which emphasises the role of communities (I would use the word culture, but there is a close alignment) which to me pulls the conclusion in a slightly different direction, that the cutting edge is not so many incentives (at least in the sense of individual micro-economic interests) as to how to shift culture so that incentives follow. Arguably this is a fine semantic distinction and the authors may of course disagree! This report provides a useful and valuable summarisation and initial synthesis of an emerging and dynamic literature. Perhaps it is also worth a comment on how apparently small this core literature is? For an area that has defined the focus of policy makers for nearly a decade, it seems surprising that there are only 38 substantive studies or interventions actually examining what works! PubMed Abstract | Publisher Full Text Pasek JE: Historical Development and Key Issues of Data Management Plan Requirements for National Science Foundation Grants: A Review Digging into data management in public-funded, international research in digital humanities Using Stakeholder and Pragmatic Analyses to Clarify the Scenario of Data Sharing in Scientific Software Editor's notes: Sharing qualitative research data, improving data literacy and establishing national data services Understanding the data-sharing debate in the context of Aotearoa/New Zealand: a narrative review on the perspectives of funders, publishers/journals, researchers, participants and Māori collectives Data standards can boost metabolomics research, and if there is a will, there is a way ): e0229003. PubMed Abstract | Publisher Full Text | Free Full Text Thelwall M, Kousha K: Do journal data sharing mandates work? Life sciences evidence from Dryad PubMed Abstract | Publisher Full Text | Free Full Text Van Panhuis WG: Project tycho 2.0: a new open access, global data infrastructure for infectious diseases to improve research capacity and innovation through north-south partnerships Reproducible and reusable research: are journal data sharing policies meeting the mark? PubMed Abstract | Publisher Full Text Yoon A, Kim Y: Social scientists' data reuse behaviors: Exploring the roles of attitudinal beliefs, attitudes, norms, and data repositories Erratum to: What incentives increase data sharing in health and medical research? A systematic review Publisher Full Text Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes Are sufficient details of the methods and analysis provided to allow replication by others? References Data Communities: A New Model for Supporting STEM Data Sharing Data Communities: Empowering Researcher-Driven Data Sharing in the Sciences The authors would like to acknowledge and thank Professor James Wilsdon, for leading discussions on the initial framing of the project and co-authoring the project proposal. Competing Interests: No competing interests were disclosed.Reviewer Expertise: Open Science, research data management, reproducibility I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 01 February 2022 https://doi.org/10. 21956/wellcomeopenres.19111.r47798 © 2022 Tenopir C. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. School of Information Sciences, University of Tennessee, Knoxville, TN, USA This is a complete review of the literature on why researchers share (or do not share) their research data and incentives to sharing. Over 1100 papers were reviewed to develop seven themes that are relevant to why researchers share data. Publishers, funders, and data managers will be especially interested in the findings that look at how policies, metrics, software, agreements, badges, mandates and other initiatives influence sharing of research data. The two main limitations of the paper are clearly stated and acknowledged. The first is the choice of Web of Science and LISTA as the source of the papers being reviewed. This perhaps is the reason for the second limitation--that is the data sharing incentives and discussions are almost all regarding scientific or medial data, with some referencing social science data. Results cannot then be extrapolated to humanities data or scholars of the humanities. These are not major limitations and, since even the definition of data may differ between science and humanities, is more likely a strength that allow robust conclusions. Competing Interests: No competing interests were disclosed.Reviewer Expertise: Information sciences; publishing behaviors; research data behaviors I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 18 January 2022 https://doi.org/10. 21956/wellcomeopenres.19111.r47797 Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes Are the conclusions drawn adequately supported by the results presented in the review? Yes