Towards computational reproducibility: researcher perspectives on the use and sharing of software A peer-reviewed version of this preprint was published in PeerJ on 17 September 2018. View the peer-reviewed version (peerj.com/articles/cs-163), which is the preferred citable publication unless you specifically need to cite this preprint. AlNoamany Y, Borghi JA. 2018. Towards computational reproducibility: researcher perspectives on the use and sharing of software. PeerJ Computer Science 4:e163 https://doi.org/10.7717/peerj-cs.163 https://doi.org/10.7717/peerj-cs.163 https://doi.org/10.7717/peerj-cs.163 Towards computational reproducibility: researcher perspectives on the use and sharing of software Yasmin Alnoamany Corresp., 1 , John A. Borghi 2 1 University of California, Berkeley, United States 2 California Digital Library Corresponding Author: Yasmin Alnoamany Email address: yasminal@berkeley.edu Research software, which includes both the source code and executables used as part of the research process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform such efforts, we conducted a survey to better understand the characteristics of research software as well as how it is created, used, and shared by researchers. Based on the responses of 215 participants, representing a range of research disciplines, we found that researchers create, use, and share software in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation. More participants indicated that they use open source software than commercial software. While a relatively small number of programming languages (e.g. Python, R, JavaScript, C++, Matlab) are used by a large number, there is a long tail of languages used by relatively few. Between group comparisons revealed that significantly more participants from computer science write source code and create executables than participants from other disciplines. Group comparisons related to knowledge of best practices related to software creation or sharing were not significant. While many participants indicated that they draw a distinction between the sharing and preservation of software, related practices and perceptions were often not aligned with those of the broader scholarly communications community. PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26727v1 | CC BY 4.0 Open Access | rec: 19 Mar 2018, publ: 19 Mar 2018 Towards Computational Reproducibility:1 Researcher Perspectives on the Use and2 Sharing of Software3 Yasmin AlNoamany1 and John A. Borghi24 1University of California, Berkeley5 2California Digital Library6 Corresponding author:7 Yasmin AlNoamany18 Email address: yasminal@berkeley.edu9 ABSTRACT10 Research software, which includes both the source code and executables used as part of the research process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform such efforts, we conducted a survey to better understand the characteristics of research software as well as how it is created, used, and shared by researchers. Based on the responses of 215 participants, representing a range of research disciplines, we found that researchers create, use, and share software in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation. More participants indicated that they use open source software than commercial software. While a relatively small number of programming languages (e.g. Python, R, JavaScript, C++, Matlab) are used by a large number, there is a long tail of languages used by relatively few. Between group comparisons revealed that significantly more participants from computer science write source code and create executables than participants from other disciplines. Group comparisons related to knowledge of best practices related to software creation or sharing were not significant. While many participants indicated that they draw a distinction between the sharing and preservation of software, related practices and perceptions were often not aligned with those of the broader scholarly communications community. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 INTRODUCTION26 Research software is a important consideration when addressing concerns related to reproducibility (Hong27 (2011); Hong (2014); Stodden et al. (2014); Goble (2014)). Effective management and sharing of software28 saves time, increases transparency, and advances science (Prlić and Procter (2012)). At present, there are29 several converging efforts to ensure that software is positioned as a “first class” research object that is30 maintained, assessed, and cited in a similar fashion as scholarly publications (e.g. NIH (2016); Katz et al.31 (2013); Ram et al. (2017); Crouch et al. (2013)). However, while there is a burgeoning literature exploring32 the actives of researchers in relation to materials like data (Tenopir et al. (2015); Tenopir et al. (2015);33 Monteith et al. (2014); Kim and Stanton (2016)), those related to software have received less attention.34 Specifically, we have been unable to find a study that thoroughly examines how researchers use, share and35 value their software.36 In this paper we report the results of a survey designed to capture researcher practices and perceptions37 related to software. Survey questions addressed a variety of topics including:38 1. What are the characteristics of research software?39 2. How do researchers use software?40 3. To what extent do current practices related to software align with those related to reproducibility?41 4. How do researchers share software?42 5. How do researchers preserve software?43 After filtering, 215 researchers participated in our survey. Overall, our results demonstrate that44 researchers create software using a wide variety of programming languages, use software for a wide45 variety of purposes, have adopted some- but not all- practices recommended to address reproducibility,46 often share software outside of traditional scholarly communication channels, and generally do not47 actively preserve their software. Participants from computer science reported that they write source code48 and create executables significantly more than participants from other disciplines. However, other group49 comparisons largely did not reach statistical significance.50 In the following sections, we provide a more detailed description of our findings. We start with51 an overview of the related literature (Section 2) then a description of our survey instrument and the52 demographic characteristics of our participants (Section 3; Section 4). In section 5, we describe our53 findings related to the characteristics of research software and its usage. Responses to questions involving54 reproducibility-related practices are detailed in Section 6. Section 7 outlines the responses to questions55 related to software sharing and preservation. We discuss the implications of our findings in Section 8.56 Finally, Section 9 contains a discussion of future work.57 2 RELATED WORK58 While there is an emerging body of research examining researcher practices, perceptions, and priorities59 for products like data (Fecher et al. (2015); Kratz and Strasser (2015); Tenopir et al. (2011); Tenopir60 et al. (2015)), work related to software has generally focused on how it is found, adopted, and credited61 (Howison and Bullard (2015b); Hucka and Graham (2016); Joppa et al. (2013)). For example, research62 examining the re-use of software demonstrates that the most common difficulty for users looking for63 software is a lack of documentation and that finding software is a difficult task even within technology64 companies (Sadowski et al. (2015)). However, as software is increasingly central to the research process65 (Borgman et al. (2012)), understanding its characteristics, its use, and the related practices and perceptions66 of researchers is an essential component of addressing reproducibility.67 The term “reproducibility” has been applied to a variety of efforts aimed at addressing the misalignment68 between good research practices, including those emphasizing transparency and methodological rigor,69 and the academic reward system, which generally emphasizes the publication of novel and positive70 results (Nosek et al. (2012); Munafò et al. (2017)). Attempts to provide a cohesive lexicon for describing71 reproducibility-related activities are described elsewhere (Goodman et al. (2016)) but computational72 reproducibility generally refers to the description and sharing of software tools and data in such a73 manner as to enable their use and evaluation by others (Stodden et al. (2013)). Efforts aimed at fostering74 computational reproducibility are often focused on the sharing of source code but may also include the75 establishment of best practice guidelines related to how software tools are described, cited, and licensed76 (e.g. Stodden et al. (2016)).77 Because of the costs of irreproducibility, there have been numerous calls urging researchers to more78 thoroughly describe and share their software (Barnes (2010); Ince et al. (2012); Joppa et al. (2013);79 Morin et al. (2012a)). Such calls are increasingly backed by mandates from funding agencies. For80 example, the Wellcome Trust now expects that grant recipients make available “any original software81 that is required to view datasets or to replicate analyses” (Wellcome (2017)). In parallel, a myriad of82 guidelines, tools, and organizations have emerged to help researchers address issues related to their83 software. Software-related best practices have been outlined for both individuals working in specific84 research disciplines (Eglen et al. (2017); Marwick (2017)) and for the research community in general85 (e.g. Piccolo and Frampton (2016); Sandve et al. (2013); Jimenez et al. (2017)). Literate programming86 tools such as Jupyter notebooks (Perez and Granger (2007)) allow researchers to combine data, code,87 comments, and outputs (e.g., figures and tables) in a human-readable fashion, while packaging and88 containerization platforms such as ReproZip (Chirigati et al. (2013)) and Docker (Boettiger (2015)) enable89 the tracking, bundling, and sharing of all of the software libraries and dependencies associated with90 a research project. Through their integration with Github (https://github.com/), services like91 Figshare (https://figshare.com/) and Zenodo (https://zenodo.org/) allow researchers92 to deposit, archive, and receive persistent identifiers for their software. Training researchers to better93 develop, use, and maintain software tools is the primary focus of community organizations including94 The Carpentries (Wilson (2006); Teal et al. (2015)) and the Software Sustainability Institute (Crouch95 2/17 et al. (2013)) while scholarly communications-focused organizations such as Force11 have published96 guidelines for describing and citing software (Smith et al. (2016)).97 As is evident in the above description, reproducibility-related efforts involving software often, but not98 always, overlap with those related to data. However, software presents a number of unique challenges99 compared to data and other research products. Even defining the bounds of the term “software” is100 challenging. For example, the National Institute of Standards and Technology (NIST) defines software as101 “Computer programs and associated data that may be dynamically written or modified during execution.”102 (Kissel et al. (2011)), a definition that is as recursive as it is potentially confusing for researchers without a103 background in computer science or software development. Software involves highly interdependent source104 and binary components that are sensitive to changes in operating environment and are difficult to track105 (Thain et al. (2015)). Evaluating the validity and reliability of software often requires inspecting source106 code, which is not possible when proprietary licenses are applied (Morin et al. (2012b); Stodden (2009)).107 Even when source code is technically available, important information about versions, parameters, and108 runtime environments is often missing from the scholarly record (Howison and Bullard (2015b); Pan et al.109 (2016); Stodden et al. (2013)). Seemingly small alterations, even for well described and openly available110 software tools, can lead to significant effects on analytical outputs (McCarthy et al. (2014)), a problem111 exacerbated by the fact that researchers often have minimal formal training in software development112 practices (Hannay et al. (2009); Joppa et al. (2013); Prabhu et al. (2011)). The iterative and collaborative113 nature of software development also means that it does not fit easily within existing academic incentive114 structures (Hafer and Kirkpatrick (2009); Howison and Herbsleb (2011); Howison and Herbsleb (2013)).115 Research software is a growing concern for research service providers, including those affiliated with116 academic institutions. Often through workshops facilitated by The Carpentries, many have begun to117 provide guidance and training to researchers looking to create and use software tools. Services related118 to the preservation of software have also been explored by some academic libraries (e.g. Rios (2016)).119 However, these activities remain relatively nascent and it is presently unclear what a mature set of120 services related to research software and computational reproducibility might look like. By identifying121 the characteristics or research software, its uses, and elucidating the related practices and perceptions of122 researchers, we hope to establish a benchmark that can be applied to inform the development of such123 services in the future.124 3 METHODS125 In order to understand researcher practices and perceptions related to software and computational repro-126 ducibility, we designed and disseminated an online survey via the Qualtrics platform (www.qualtrics.127 com). The survey was advertised through blog posts, social media, and research-related email lists and128 listservs. Because the survey was distributed using different communication channels, we could not129 calculate the response rate. In Section 4, we detail the demographics of the survey’s participants.130 All study materials and procedures were approved by the University of California Berkeley Committee131 for Protection of Human Subjects and Office for the Protection of Human Subjects (protocol ID 2016-132 11-9358). The full text of the survey can be found in the supplementary materials. Before beginning133 the survey, participants were required to read and give their informed consent to participate. After134 reading the informed consent form (see survey), participants indicated their consent by checking a135 box. Information from participants who did not check this box was removed from all subsequent136 analyses. An anonymized version of our survey results (AlNoamany and Borghi (2018a)) as well137 as the code we used for its analysis (AlNoamany and Borghi (2018b)) are also available on Github138 (https://github.com/yasmina85/swcuration).139 3.1 Survey Design140 The survey was developed to capture a broad range of information about how researchers use, share, and141 value their software. The final survey instrument consisted of 56 questions (53 multiple choices, 3 open142 response), divided into four sections. In order, the sections focused on:143 1. Demographics: Included questions related the participant’s research discipline, role, degree, age,144 institution, and funding sources (7 questions)145 2. Characteristics of research software: Included questions related to how the participants use software146 and the characteristics of their software (17 questions).147 3/17 3. Software sharing practices: Included questions related to how participants make their software148 available to others (18 questions).149 4. How researchers assign value to software (14 questions).150 Because only sections 2 and 3 addressed topics related to computational reproducibility, this paper151 is focused on responses to questions in the first three questions. Future work will further delineate how152 researchers value software.153 We hypothesized that study participants would come to our survey with different levels of knowledge154 about software development practices and terminology. Therefore, we included a brief list of definitions in155 our survey for terms like “source code”, “executable”, and “open source software” that participants could156 refer back to at any time. Participants were not required to answer every question in order to proceed157 through the survey.158 3.2 Filtering and Exclusion Criteria159 We collected 330 responses to an online survey of software usage and sharing practices and perceptions160 from late January to early April of 2017. We excluded participants who started the survey but did not161 answer questions beyond the demographic section to have 215 unique responses. Though the majority of162 our participants indicated that they were from academia (Table 1), we did not exclude any participant163 due to institution type because of the possibility that participants could be affiliated with an academic164 or research program while conducting work in another sector. Institution names and disciplines were165 canonicalized (e.g. UCB and uc berkeley were mapped to UC Berkeley).166 Table 1. Demographic breakdown for study participants. Discipline Count Percentage Institution Count Percentage Computer Science 39 18.3% Academic: Research Focused 164 77.0% Biology 29 13.6% Academic: Teaching Focused 22 10.3% Psychology 28 13.1% Government 13 6.1% Engineering 13 6.1% Nonprofit 7 3.3% Interdisciplinary Programs 12 5.6% Academic: Medical School 3 1.4% Mathematics 12 5.6% Commercial 2 0.9% Physics 12 5.6% Other 2 0.9% Earth Science 9 4.2% Role Count Percentage Library Sciences 9 4.2% Graduate Student 67 31.5% Social Sciences 9 4.2% Postdoc 38 17.8% others 41 19.20% Research Faculty 35 16.4% Highest degree Count Percentage Staff 29 13.6% Doctorate 110 51.9% Principal Investigator 25 11.7% Masters 72 34.0% Research Assistant 10 4.7% Bachelors 26 12.3% Undergraduate Student 2 0.9% High school 3 1.4% Research 1 0.5% Professional degree 1 0.5% Other 6 2.8% 4 PARTICIPANT DEMOGRAPHICS167 We asked participants about their age, professional degrees, professional title (or role) and institutional168 affiliation, institution type, and the sources of funding. The majority of these questions were multiple169 choice with an option for open response upon selecting “Other”.170 The mean and median age of our participants were 35.8 and 33 years old, respectively. Reflecting171 the ubiquity of software within the research enterprise, participants were drawn from a wide variety of172 research disciplines, institution types, and roles. As shown in Table 1, the disciplines most represented173 in our sample were computer science, biology, and psychology. The majority of our participants were174 drawn from 129 different research-focused academic institutions (including 12% out of 215 researchers175 from UC Berkeley). Table 1 also shows that participants had a range of degrees and roles, with the most176 common being doctorate (51.9%, N = 215) and graduate student (31.5%, N = 215), respectively. In terms177 of funding, the most common responses were the National Science Foundation (NSF) (16.7%, N = 215)178 and the National Institutes of Health (7.0%, N = 215).179 4/17 5 CHARACTERISTICS AND USE OF RESEARCH SOFTWARE180 Before diving deeper into how researchers use their software, we wanted to identify its characteristics.181 In this section, we describe responses to questions related to the creation and use of source code and182 executables.183 5.1 Source Code and Executables184 We asked participants about the generation and use of source code and executables (i.e. Do you write185 source code?, Do you use source code written by others?, Do you create executables?, Do you use186 executables created by others?). We found that 84.2% out of 215 responding participants write source187 code and 89.8% out of 215 use source code written by others while 53.7% out of 214 create executables188 and 80.4% use executables written by others.189 Figure 1 shows that participants from computer science were significantly more likely to write source190 code [χ 2(2,N = 215) = 8.93, p < 0.05], create executables [χ 2(2,N = 214) = 22.67, p < 0.00001],191 and use executables created by others [χ 2(2,N = 214) = 6.66, p < 0.05] than participants from other192 disciplines. Comparisons related to the use of others’ source code did not reach statistical significance193 [χ 2(2,N = 215) = 1.21, p = 0.55].194 We also asked participants about the type of software they use (i.e. Do you use commercial software195 in the course of your research? Do you use open source software in the course of your research?). As196 shown in Figure 2 more participants indicated that they use open source software (94.9%, N = 214) than197 commercial software (72.8%, N = 214).198 5.2 Programming Languages199 In order to quantify the breadth of programming languages used in a research setting, we asked participants200 about the languages they use when writing their own code. Table 2 shows the top ten languages, which201 together account for 86.4% of languages selected. The top used languages in our sample were Python, R,202 Javascript, C++, Matlab, Java, C, PHP, and Perl. Python and R were the most used languages, selected by203 64.0% and 57.0% of participants of respectively. For the most part, these results are in line with previous204 findings from Hucka and Garaham (Hucka and Graham (2016)) and also match those of a recent study205 from Stack Overflow (Inc. (2016)). In total, 52 different languages were chosen, with the most common206 responses outside of the top ten being Ruby, C#, ASP, SAS, XML, XQuery, and Julia. Quantitatively207 measuring the use programming languages in academic research is difficult due to the variability of208 reporting practices (Howison and Bullard (2015a)), but our results are largely in line the rapid ascent of R209 and Python as tools for data science.210 Table 2. The top 10 programming languages used by the researchers in our sample. A total of 214 participants answered this question. Together these languages represent 86.4% of the languages selected. Note that participants could choose more than one language. Language Python R SQL Javascript C++ Matlab Java C PHP PERL Selection 137 122 60 57 54 45 35 25 25 21 Percentage 64.0% 57.0% 28.0% 26.6% 25.2% 21.0% 16.4% 11.7% 11.7% 9.8% We also inquired about collaborative code development and the extent to which the same programming211 languages are used within a lab or a research group. Though 53.3% of participants indicated that they212 write code collaboratively, we were surprised to see that only 33.0% indicated that everyone in the lab213 used the same language(s).214 5.3 Use of Research Software215 Previous scholarship (e.g. Borgman et al. (2012)) has indicated that researchers use software for a wide216 variety of purposes. To examine the purposes of research software, we asked participants about how they217 use their code or software (Figure 1). This question allowed them to choose multiple answers from a218 suggested list or input other answers.219 Figure 3(a) shows that our participants use software primarily to analyze data, visualize data, clean220 and organize data, automate their work, and collect data. A total of 104 participants (55.7% out of 212221 participants) responded that they use software for all five. “Other” responses included running simulations,222 building models, researching algorithms, testing methods, writing compilers, and sharing and publishing223 5/17 Yes No I don't know % 0 20 40 60 80 100 81.61 16.67 1.72 94.87 5.13 0 ● ● Non−CS CS (a) Do you write source code? N = 215 Yes No I don't know % 0 20 40 60 80 100 89.66 9.2 1.15 89.74 10.26 0 ● ● Non−CS CS (b) Do you use source code written by others? N = 215 Yes No I don't know % 0 20 40 60 80 100 50.29 46.82 2.89 81.58 18.42 0 ● ● Non−CS CS (c) Do you create executables? N = 214 Yes No I don't know % 0 20 40 60 80 100 78.16 17.82 4.02 89.47 10.53 0 ● ● Non−CS CS (d) Do you use executables created by others? N = 214 Figure 1. Significantly more participants from computer science stated that they write source code, create executables, and use executables created by others than participants from other disciplines. Yes No I don't know % 0 20 40 60 80 100 94.39 71.96 4.67 25.23 0.47 1.87 Open source Commercial software Figure 2. The use of open source software versus commercial software. N = 214. 6/17 To analyze data To visualize data To clean/organize data To do automation To collect data Other % 0 20 40 60 80 100 94.34 89.62 86.32 81.13 72.64 11.32 (a) How do you use code or software? N = 212 Yes No I don't know % 0 20 40 60 80 100 82.13 15.46 2.42 (b) Have you ever repurposed your code or software? N = 208 Figure 3. The purpose of using research software. Note that the first question could be answered with more than one choice. data. We also asked if researchers repurpose their code (i.e. using it for a project other than the one for224 which it was originally created) and found that 82% out of 208 participants indicated that they do that.225 We investigated how researchers collaborate on code writing within their research labs (Figure 4) (e.g.226 “Do you write code collaboratively (i.e. with another person or multiple people)?”, “Does everyone in227 your lab or research group write code using the same programming language(s)?”) We found that 49.8%228 (N = 200) of researchers write code collaboratively (Figure 4(a)), while only 30% (N = 201) use the229 same coding language in their research labs (Figure 4(b)).230 Previous studies on the reuse of research software have focused mainly on licensing, review of code,231 and user awareness (Joppa et al. (2013); Morin et al. (2012a)). Reinforcing the need to establish best232 practices (or good enough practices - e.g. Wilson et al. (2017) akin to those related to the management of233 research data, 79.8% of our participants (N = 208) indicated that they repurpose their code.234 In an open response question, we asked participants to describe, in their own words, how they use235 their software and code. Here, again, participants indicated that they use software for a wide variety of236 purposes. One participant summed the relationship between software and their research succinctly as “I237 use software for stimulus presentation, data acquisition, and data analysis and visualization - basically238 my entire research is run via computers (and thus code).” Similarly, another participant described the239 application of software within the field of computer science: “As a computer scientist, almost every aspect240 of my research from grant proposal to collecting data to analyzing data to writing up my results involves241 software. I write software. I use software my collaborators or students write as well as open source and242 commercial software.243 6 REPRODUCIBILITY-RELATED PRACTICES244 To understand how the practices of our participants align with those related to computational reproducibil-245 ity, we asked a number of questions about adding comments to source code, generating documentation,246 communicating information about dependencies, and using “notebook” applications such as Jupyter. We247 also asked about awareness of coding conventions and best practices. The results of these questions are248 shown in Figure 5.249 In line with previous research (Hannay et al. (2009); Joppa et al. (2013); Prabhu et al. (2011)),250 only 53.4% (N = 215) of our participants indicated that they have received formal training in coding251 conventions or best practices. At the same time, we found that many actually employ practices that are252 commonly cited for establishing computational reproducibility. For example, when asked “Do you include253 comments in your code?” and “When you share your code or software, do you provide information254 about dependencies?” the majority of participants (98.0%, N = 204, 72.2%, N = 169) indicated that they255 include comments and provide information about dependencies, respectively. However, substantially256 7/17 Yes No I don't know % 0 10 20 30 40 50 60 70 33 63.5 3.5 (a) Does everyone in your lab or research group write code using the same programming language(s)? N = 201 Yes No I don't know % 0 10 20 30 40 50 60 70 53.27 45.73 1.01 (b) Do you write code collaboratively? N = 200 Figure 4. Consistency of programming languages within research groups. Yes No % 0 10 20 30 40 50 60 70 51.72 48.28 61.54 38.46 ● ● Non−CS CS (a) Have you received training in coding conventions or best practices? N = 215. Yes No I don't know % 0 20 40 60 80 100 98.18 0.61 1.21 97.3 2.7 0 ● ● Non−CS CS (b) Do you include comments in your code? N = 204. Yes No I don't know % 0 20 40 60 80 61.45 36.75 1.81 54.05 43.24 2.7 ● ● Non−CS CS (c) Do you generate documentation for your code? N = 205. Yes No I don't know % 0 10 20 30 40 50 60 44.91 52.69 2.4 40.54 59.46 0 ● ● Non−CS CS (d) Do you write code using a notebook? N = 206. Figure 5. Reproducibility practices in research. 8/17 Yes No I don't know % 0 20 40 60 80 100 66.15 30 3.85 68.97 31.03 0 ● ● Non−CS CS (a) When you share your code or software, do you share it alongside related files (e.g. datasets)? N = 161. Yes No I don't know % 0 20 40 60 80 100 67.88 21.9 10.22 90 10 0 ● ● Non−CS CS (b) When you share your code or software, do you provide information about dependencies? N = 169. Figure 6. CS researchers tend to provide information about dependencies more than other disciplines. fewer indicated that they employ other practices such as generating documentation (60.0%, N = 205).257 While electronic lab notebooks have been cited as a tool for ensuring reproducibility (Kluyver et al.258 (2016)), only 43.6% (N = 206) of our participants indicated that they use them to write code.259 Comparisons of responses by discipline (e.g. computer science versus others) or location (e.g. UC260 Berkeley versus others) were insignificant even, surprisingly, on questions related to training [discipline:261 χ 2(1,N = 215) = 1.58, p = 0.21, location: χ 2(2,N = 215) = 0.00, p = 1.00] (Figure 5). The lone262 exception was in providing information about dependencies. Significantly more respondents from263 computer science reported that they include information about dependencies when they share their code264 than participants from other disciplines [χ 2(2,N = 169) = 17.755, p < 0.001].265 7 SHARING AND PRESERVATION OF THE RESEARCH SOFTWARE266 Making materials available for others to evaluate, use, and build upon is an essential component of267 ensuring reproducibility. Much of the previous work examining the sharing of research software has268 focused on the degree to which software is cited and described irregularly in the scholarly literature269 (Howison and Bullard (2015a); Smith et al. (2016)) and the relationship between code sharing and research270 impact (Vandewalle (2012)). In order to gain a greater understanding of how sharing practices relate to271 reproducibility, we asked our participants a variety of questions about how they share, find, and preserve272 software.273 7.1 Sharing Research Software274 Sharing Practices275 While only half (50.5%, N = 198) of our participants indicated that they were aware of related community276 standards in their field or discipline, the majority indicated that they share software as part of the research277 process (computer science: 84.9%, other disciplines: 81.1% for N = 187) (Figure 7). Of 189 participants,278 31% indicated that there were reasons their software could not be shared (Figure 7(b)). The most279 commonly cited restrictions on sharing were the inclusion of sensitive data, intellectual property concerns,280 and the time needed to prepare code for sharing. Comparisons between computer science and other281 disciplines on the sharing of code were not statistically significant [χ 2(2,N = 187) = 1.5842, p > 0.4529].282 We also checked if participants share new versions of their code and found that 81% (N = 156)283 do so using a version control system. A group comparison related to the sharing of new versions was284 not statistically significant [CS vs non-CS: χ 2(2,N = 156) = 2.2, p > 0.05] (Figure 7(c)), however285 significantly more participants from computer science indicated that they share their codes via a version286 control system than those from other disciplines [χ 2(2,N = 185) = 16.4, p < 0.05] (Figure 7(d)).287 9/17 Yes No I don't know % 0 20 40 60 80 100 81.05 17.65 1.31 84.85 15.15 0 ● ● Non−CS CS (a) Do you share the code or software created as part of your research? N = 187. Yes No I don't know % 0 20 40 60 80 100 31.61 64.52 3.87 30.3 66.67 3.03 ● ● Non−CS CS (b) Is there any reason your code or software could not be shared? N = 189. Yes No I don't know % 0 20 40 60 80 100 86.61 7.87 5.51 92.86 3.57 3.57 ● ● Non−CS CS (c) If you make a change to your code, do you share a new version? N = 156. Yes No I don't know % 0 20 40 60 80 100 78.15 18.54 3.31 96.97 3.03 0 ● ● Non−CS CS (d) Do you use a version control system (e.g. Git, SVN)? N = 185. Figure 7. Practices of code sharing. 10/17 Directly via e−mail In a scholarly publication Through posts on my website/lab website Through social media In a Software or Data paper Through online communities Other % 0 10 30 50 70 61.66 54.4 43.01 37.31 17.1 14.51 16.58 (a) How do you tell people about the code or software you’ve shared? N = 165. Source Code Executable code Both 0 20 40 60 80 75.29 7.65 17.06 (b) In what format do you typically share your code? N = 175. Figure 8. Methods and formats for sharing software. Note that both of these questions could be answered with more than one response. Sharing Format and Platform288 We asked our participants about how they share their code and found that 75.3% of 175 participants289 share their software in the form of source code, 7.6% share executables only, and 17.1% share both290 formats (Figure 8). As shown in Figure 8(a), participants indicated that they share their software through291 a variety of channels, with the most common being e-mail. The figure shows that 73.94% of the time292 our participants make their code available through direct communication and 50% make their code293 available through social media platforms. The participants who indicated that they use methods other than294 those listed in our survey generally responded that they do so using platforms such Github or the Open295 Science Framework. A few researchers mentioned that they save their code along with the dataset in their296 institutional repository, while others indicated that they publicize their code via conferences.297 7.2 Preserving Research Software298 We asked variety of questions about preserving research software (i.e. Do you take any steps to ensure299 that your code or software is preserved over the long term?, How long do you typically save your code or300 software?, and Where do you save your code or software so that it is preserved over the long term?). While301 research software is a building block for ensuring the reproducibility, 39.9% of participants (N = 183) do302 not prepare their code for long-term preservation.303 How long do you typically save your code or software?304 Figure 9(a) shows that the majority of our participants (40.4% out of 162) preserve their code for more than305 eight years, but generally not in a way that maintains its use. In contrast, 7.4% (N = 162) of participants306 keep their codes until it is described in a publication, poster, or presentation. We found 10.5% out of 162307 researchers tend to keep their codes 3 years or less and 19.8% tend to keep their codes 4-8 year. Only308 21.0% out of 162 researchers tend to keep their codes for 8 years or more with maintaining their codes for309 future access and use.310 Where do you save your code or software so that it is preserved over the long term?311 In terms of where our participants preserve their code, Figure 9(b) shows that 76.2% of the time participants312 use code hosting sites such as Github. About 56.4% of the time, researchers use hard drives or external313 storage to preserve their codes and 38.1% of the time they preserve their codes by putting them on314 the cloud. Only 12.7% of our participants indicated that they use archival repositories (e.g. figshare).315 The participants who entered “other” responses mentioned that they use a backup system of their lab,316 organization archive (e.g., University server), their own PC, language package registry (CRAN, PyPi or317 similar), Internal SVN repository, or project specific websites.318 11/17 More than 8 years without maintaining More than 8 years and maintained 4−8 years 0−3 years Until it is described in a publication % 0 10 20 30 40 50 41.4 21 19.8 10.5 7.4 (a) How long do you typically save your code or software? N = 162. On a code hosting site On a hard drive/external storage In the cloud On my website In an archival repository Other In a discipline specific index or registry % 0 20 40 60 80 76.2 56.4 38.1 20.4 12.7 10.5 3.9 (b) Where do you save your code or software so that it is preserved over the long term? N = 182. Figure 9. 76.2% of researchers use Github for preserving their codes. Note that the second question could be answered with more than one choice. We asked participants to define sharing and preserving in their own words. Their responses generally319 indicated that they make a distinction between the two concepts. For example, one participant stated320 succinctly, “sharing is making code available to others, in a readily usable form. Preserving is ensuring321 to the extent practical that the code will be usable as far into the future as possible.” However, several322 responses indicated that participants did not necessarily regard preservation as an active process that323 continues even after the conclusion of a particular project (e.g. “sharing means giving access to my code324 to someone else. Preserving means placing my code somewhere where it can remain and I will not delete325 it to save room or lose it when I switch computers or suffer a hard drive failure.”. In contrast, other326 responses indicated that participants were aware that preservation is important for reuse purpose and had a327 knowledge of preservation tools. For example, one researcher defined preserving software as, “branching328 so that code remains compatible with different versions of overarching libraries (in my case) or with329 new coding standards and compilers”. and another stated “Preserving should be done via a system like330 LOCKSS that ensures that provides for redundancy. Sharing can be done via the web, but must include a331 license so that recipients know about their rights.”332 8 DISCUSSION333 Scholars throughout the humanities and sciences depend on software for a wide variety of purposes,334 including the collection, analysis, and visualization of data (Borgman et al. (2012); Hey et al. (2009)).335 Though ubiquitous, software presents significant challenges to efforts aimed at ensuring reproducibility.336 Our results demonstrate that researchers not only create and use software in a wide variety of forms and337 for a wide variety of purposes, but also that their software-related practices are often not completely in338 line with those associated with reproducibility. In particular, our results demonstrate that, while scholars339 often save their software for long periods of time, many do not actively preserve or maintain it. This340 perspective is perhaps best encapsulated by one of our participants who, when completing our open341 response question about the definition of sharing and preserving software, wrote “Sharing means making342 it publicly available on Github. Preserving means leaving it on GitHub”. We share this anecdote not343 to criticize our participants or their practices, but to illustrate the outstanding need for support services344 related to software.345 In the broader scholarly communications space, there are several prominent frameworks that relate to346 the reproducibility of scholarly outputs. As part of an effort to advance data as a “first class” research347 product, the FAIR (Findable, Accessible, Interoperable, and Reusable) guidelines provide a measurable348 set of principles related to the management and sharing of research data (Wilkinson et al. (2016)).349 12/17 The FAIR principles are general enough that they can, with some modification, also be applied to350 software (Jimenez et al. (2017)). At the level of scholarly publications, the TOP (Transparency and351 Openness Promotion) guidelines (Nosek et al. (2015)) addresses citation standards and the availability of352 research materials including data and software. A supplement to TOP, the Reproducibility Enhancement353 Principles (REP) (Stodden et al. (2016)) specifically targets disclosure issues related to computation354 and software. However, our results support previous work indicating that software still mostly exists355 outside the reputation economy of science (Howison and Herbsleb (2011)) which indicates that a more356 education-based approach, that provides guidance about software before the publication stage is necessary.357 The majority of our participants indicated that view code or software as “first class” research product,358 that should be assessed, valued, and shared in the same way as a journal article. However, our results359 also indicate that there remains a significant gap between this perception and actual practice. The fact360 that our participants indicated that they create and use software in a wide variety of forms and for a wide361 variety of purposes demonstrates the significant technical challenges inherent in ensuring computational362 reproducibility. In contrast, the lack of active preservation and tendency to share software outside363 traditional (and measurable) scholarly communications channels displayed by our sample demonstrates364 the social and behavioral challenges. A significant difficulty in ensuring computational reproducibility is365 that researchers oftentimes do not treat their software as a “first class” research product. These findings366 reinforce the need for programs to train researchers on how to maintain their code in the active phase of367 their research.368 At present, there are a number of initiatives focused on addressing the preservation and reproducibility369 of software. In the United States, the Software Preservation Network (SPN) (Meyerson et al. (2017))370 represents an effort to coordinate efforts to ensure the long-term access to software. The focus of SPN is371 generally on cultural heritage software rather than research software, but their work delineating issues372 related to metadata, governance, and technical infrastructure has substantial overlap with what is required373 for research software. In the United Kingdom, the Software Sustainability Institute trains researchers374 on how to develop better software and make better use of the supporting infrastructure (Crouch et al.375 (2013)). Befitting the necessity of training and preservation indicated by our study, a similar effort, the376 US Software Sustainability Initiative was recently awarded funding by the National Science Foundation377 (NSF Award #1743188). While it is likely not possible for academic institutions to offer support services378 that cover the broad range of programming languages and applications described in our survey results,379 collaborating with such groups to create guidance and best practice recommendations may a feasible first380 step in engaging with researchers about their software and code in the same manner as many research381 data management (RDM) initiatives now engage with them about their data.382 While research stakeholders including academic institutions, publishers, and funders have an interest383 in tackling issues of computational reproducibility in order to ensure the integrity of the research process,384 our results demonstrate the complexity of doing so. One participant summed up why their code could not385 be made re-usable: “Most of my coding is project specific and not reusable between projects because the386 datasets I encounter are very variable. I typically only generate packages for tasks such as getting data387 from a database (e.g., PubMed) and keeping RMarkdown templates in an orderly way.”388 9 CONCLUSION AND FUTURE WORK389 In this paper, we introduced the results of surveying researchers across different institution on software390 usage, sharing, and preservation. We also checked the practices used to manage software for ensuring391 the reproducibility and integrity of the scientific research. Our results point to several interesting trends392 including the widespread writing of source code and use of source code written by others, the variety393 of programming languages used and the lack of consistency even within the same lab or research394 group, the use of open source software over commercial software, and the adoption of some practices395 assure computational reproducibility, such as adding comments and documentation to code, but not others,396 specifically the general lack of active preservation. The findings of this paper inform ongoing conversations397 about research software and reproducibility on the current practices around research software. This will398 help service providers to deliver the right tools and systems that help researchers to manage their code399 and help in ensuring the integrity of the reproducibility in the scholarly ecosystem.400 The present study was designed to capture a broad picture of how researchers use and share their401 software. For this reason, we were not able to provide a particularly granular picture of how individual402 practices relate to reproducible science outcomes. For example, while the majority of our participants403 13/17 responded that they include comments in their source code and generate documentation for their software,404 we were not able to make any judgment about whether or not the contents of these comments and405 documentation are sufficient to ensure reproducibility. Follow up research is needed in order to gain a406 more nuanced understanding of how processes related to the creation and use of research software relate407 to reproducibility. However, despite these limitations, our results indicate several potential directions for408 future library services centered on helping researchers create, use, and share their software and assure409 computational reproducibility.410 ACKNOWLEDGMENTS411 We would like to thank our colleagues at UC Berkeley Library and California Digital Library for their412 valuable suggestions and insightful comments throughout this project.413 REFERENCES414 AlNoamany, Y. and Borghi, J. A. (2018a). Data: Researcher perspectives on the use and sharing of415 software.416 AlNoamany, Y. and Borghi, J. A. (2018b). Software study code.417 Barnes, N. (2010). Publish your computer code: it is good enough. Nature, 467(7317):753–753.418 Boettiger, C. (2015). An introduction to docker for reproducible research. ACM SIGOPS Operating419 Systems Review, 49(1):71–79.420 Borgman, C. L., Wallis, J. C., and Mayernik, M. S. (2012). Who’s got the data? interdependencies in421 science and technology collaborations. Computer Supported Cooperative Work (CSCW), 21(6):485–422 523.423 Chirigati, F., Shasha, D., and Freire, J. (2013). Reprozip: Using provenance to support computational424 reproducibility. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance,425 TaPP ’13, pages 1–4. USENIX Association.426 Crouch, S., Hong, N. C., Hettrick, S., Jackson, M., Pawlik, A., Sufi, S., Carr, L., Roure, D. D., Goble, C.,427 and Parsons, M. (2013). The software sustainability institute: Changing research software attitudes and428 practices. Computing in Science & Engineering, 15(6):74–80.429 Eglen, S. J., Marwick, B., Halchenko, Y. O., Hanke, M., Sufi, S., Gleeson, P., Silver, R. A., Davison, A. P.,430 Lanyon, L., Abrams, M., Wachtler, T., Willshaw, D. J., Pouzat, C., and Poline, J.-B. (2017). Toward431 standard practices for sharing computer code and programs in neuroscience. Nature Neuroscience,432 20:770.433 Fecher, B., Friesike, S., and Hebing, M. (2015). What drives academic data sharing? PLOS ONE,434 10(2):1–25.435 Goble, C. (2014). Better software, better research. IEEE Internet Computing, 18(5):4–8.436 Goodman, S. N., Fanelli, D., and Ioannidis, J. P. A. (2016). What does research reproducibility mean?437 Science Translational Medicine, 8(341):341ps12–341ps12.438 Hafer, L. and Kirkpatrick, A. E. (2009). Assessing open source software as a scholarly contribution.439 Communications of the ACM, 52(12):126–129.440 Hannay, J. E., MacLeod, C., Singer, J., Langtangen, H. P., Pfahl, D., and Wilson, G. (2009). How do441 scientists develop and use scientific software? In Proceedings of the 2009 ICSE Workshop on Software442 Engineering for Computational Science and Engineering, SECSE ’09, pages 1–8. IEEE Computer443 Society.444 Hey, T., Tansley, S., and Tolle, K. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery.445 Microsoft Research.446 Hong, N. C. (2011). Digital preservation and curation: the danger of overlooking software. The447 Preservation of Complex Objects, page 25.448 Hong, N. C. (2014). Dealing with software: the research data issues.449 Howison, J. and Bullard, J. (2015a). How is software visible in the scientific literature. Technical report,450 Technical report, Univ. of Texas.451 Howison, J. and Bullard, J. (2015b). Software in the scientific literature: Problems with seeing, finding,452 and using software mentioned in the biology literature. Journal of the Association for Information453 Science and Technology.454 Howison, J. and Herbsleb, J. D. (2011). Scientific software production: Incentives and collaboration.455 14/17 In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW ’11,456 pages 513–522. ACM.457 Howison, J. and Herbsleb, J. D. (2013). Incentives and integration in scientific software production. In458 Proceedings of the ACM 2013 Conference on Computer Supported Cooperative Work, CSCW ’13,459 pages 459–470. ACM.460 Hucka, M. and Graham, M. J. (2016). Software search is not a science, even among scientists. CoRR,461 abs/1605.02265.462 Inc., S. E. (2016). Developer survey results 2016.463 Ince, D. C., Hatton, L., and Graham-Cumming, J. (2012). The case for open computer programs. Nature,464 482:485.465 Jimenez, R., Kuzak, M., Alhamdoosh, M., Barker, M., Batut, B., Borg, M., Capella-Gutierrez, S.,466 Chue Hong, N., Cook, M., Corpas, M., Flannery, M., Garcia, L., GelpÌ, J., Gladman, S., Goble, C.,467 Gonz·lez Ferreiro, M., Gonzalez-Beltran, A., Griffin, P., Gr¸ning, B., Hagberg, J., Holub, P., Hooft,468 R., Ison, J., Katz, D., Leskoek, B., Lupez Gumez, F., Oliveira, L., Mellor, D., Mosbergen, R., Mulder,469 N., Perez-Riverol, Y., Pergl, R., Pichler, H., Pope, B., Sanz, F., Schneider, M., Stodden, V., Suchecki,470 R., Svobodov· Va?ekov·, R., Talvik, H., Todorov, I., Treloar, A., Tyagi, S., van Gompel, M., Vaughan,471 D., Via, A., Wang, X., Watson-Haigh, N., and Crouch, S. (2017). Four simple recommendations472 to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research,473 6(876).474 Joppa, L. N., McInerny, G., Harper, R., Salido, L., Takeda, K., O’Hara, K., Gavaghan, D., and Emmott, S.475 (2013). Troubling trends in scientific software use. Science, 340(6134):814–815.476 Katz, D. S., Allen, G., Hong, N. C., Parashar, M., and Proctor, D. (2013). First workshop on sustainable477 software for science: Practice and experiences (wssspe): Submission and peer-review process, and478 results. arXiv preprint arXiv:1311.3523.479 Kim, Y. and Stanton, J. M. (2016). Institutional and individual factors affecting scientists’ data-sharing480 behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology,481 67(4):776–799.482 Kissel, R., Kissel, R., Blank, R., and Secretary, A. (2011). Glossary of key information security terms. In483 NIST Interagency Reports NIST IR 7298 Revision 1, National Institute of Standards and Technology.484 Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.,485 Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., and development team [Unknown],486 J. (2016). Jupyter notebooks: A publishing format for reproducible computational workflows. In487 Loizides, F. and Scmidt, B., editors, Positioning and Power in Academic Publishing: Players, Agents488 and Agendas, pages 87–90. IOS Press.489 Kratz, J. E. and Strasser, C. (2015). Researcher perspectives on publication and peer review of data. PLOS490 ONE, 10(2):1–21.491 Marwick, B. (2017). Computational reproducibility in archaeological research: Basic principles and a492 case study of their implementation. Journal of Archaeological Method and Theory, 24(2):424–450.493 McCarthy, D. J., Humburg, P., Kanapin, A., Rivas, M. A., Gaulton, K., Cazier, J.-B., and Donnelly, P.494 (2014). Choice of transcripts and software has a large effect on variant annotation. Genome Medicine,495 6(3):26.496 Meyerson, J., Vowell, Z., Hagenmaier, W., Leventhal, A., Roke, E. R., Rios, F., and Walsh, T. (2017). The497 Software Preservation Network (SPN): A Community Effort to Ensure Long Term Access to Digital498 Cultural Heritage. D-Lib Magazine, 23(5/6).499 Monteith, J. Y., McGregor, J. D., and Ingram, J. E. (2014). Scientific research software ecosystems. In500 Proceedings of the 2014 European Conference on Software Architecture Workshops, ECSAW ’14,501 pages 9:1–9:6. ACM.502 Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., and Sliz, P. (2012a). Shining light into503 black boxes. Science, 336(6078):159–160.504 Morin, A., Urban, J., and Sliz, P. (2012b). A quick guide to software licensing for the scientist-programmer.505 PLOS Computational Biology, 8(7):1–7.506 Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N.,507 Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., and Ioannidis, J. P. A. (2017). A manifesto for508 reproducible science. Nature Human Behaviour, 1(1):0021.509 NIH (2016). Strategies for nih data management, sharing, and citation.510 15/17 Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers,511 C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R.,512 Goroff, D., Green, D. P., Hesse, B., Humphreys, M., Ishiyama, J., Karlan, D., Kraut, A., Lupia,513 A., Mabry, P., Madon, T., Malhotra, N., Mayo-Wilson, E., McNutt, M., Miguel, E., Paluck, E. L.,514 Simonsohn, U., Soderberg, C., Spellman, B. A., Turitto, J., VandenBos, G., Vazire, S., Wagenmakers,515 E. J., Wilson, R., and Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242):1422–516 1425.517 Nosek, B. A., Spies, J. R., and Motyl, M. (2012). Scientific utopia: Ii. restructuring incentives and518 practices to promote truth over publishability. Perspectives on Psychological Science, 7(6):615–631.519 Pan, X., Yan, E., and Hua, W. (2016). Disciplinary differences of software use and impact in scientific520 literature. Scientometrics, 109(3):1593–1610.521 Perez, F. and Granger, B. E. (2007). Ipython: A system for interactive scientific computing. Computing in522 Science Engineering, 9(3):21–29.523 Piccolo, S. R. and Frampton, M. B. (2016). Tools and techniques for computational reproducibility.524 GigaScience, 5(1):30.525 Prabhu, P., Jablin, T. B., Raman, A., Zhang, Y., Huang, J., Kim, H., Johnson, N. P., Liu, F., Ghosh, S.,526 Beard, S., Oh, T., Zoufaly, M., Walker, D., and August, D. I. (2011). A survey of the practice of527 computational science. In State of the Practice Reports, SC ’11, pages 19:1–19:12. ACM.528 Prlić, A. and Procter, J. B. (2012). Ten simple rules for the open development of scientific software. PLoS529 Comput Biol, 8(12):e1002802.530 Ram, K., Katz, D., Carver, J., Gesing, S., and Weber, N. (2017). Si2-s2i2 conceptualization: Conceptual-531 izing a us research software sustainability institute (urssi).532 Rios, F. (2016). The Pathways of Research Software Preservation: An Educational and Planning Resource533 for Service Development. D-Lib Magazine, 22(7/8).534 Sadowski, C., Stolee, K. T., and Elbaum, S. (2015). How developers search for code: a case study. In535 Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pages 191–201.536 ACM.537 Sandve, G. K., Nekrutenko, A., Taylor, J., and Hovig, E. (2013). Ten simple rules for reproducible538 computational research. PLOS Computational Biology, 9(10):1–4.539 Smith, A. M., Katz, D. S., and Niemeyer, K. E. a. (2016). Software citation principles. PeerJ Computer540 Science, 2:e86.541 Stodden, V. (2009). The legal framework for reproducible scientific research: Licensing and copyright.542 Computing in Science & Engineering, 11(1):35–40.543 Stodden, V., Guo, P., and Ma, Z. (2013). Toward reproducible computational research: An empirical544 analysis of data and code policy adoption by journals. PLOS ONE, 8(6):1–8.545 Stodden, V., Leisch, F., and Peng, R. D. (2014). Implementing reproducible research. CRC Press.546 Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., Heroux, M. A., Ioannidis, J. P.,547 and Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354(6317):1240–548 1241.549 Teal, T. K., Cranston, K. A., Lapp, H., White, E., Wilson, G., Ram, K., and Pawlik, A. (2015). Data550 carpentry: workshops to increase data literacy for researchers. International Journal of Digital Curation,551 10(1):135–143.552 Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., and Frame, M.553 (2011). Data sharing by scientists: practices and perceptions. PloS one, 6(6):e21101.554 Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D., and Dorsett, K. (2015).555 Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLOS556 ONE, 10(8):1–24.557 Thain, D., Ivie, P., and Meng, H. (2015). Techniques for Preserving Scientific Software Executions:558 Preserve the Mess or Encourage Cleanliness? Proceedings of the 12th International Conference on559 Digital Preservation (iPRES).560 Vandewalle, P. (2012). Code sharing is associated with research impact in image processing. Computing561 in Science Engineering, 14(4):42–47.562 Wellcome (2017). Policy on data, software and materials management and sharing.563 Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N.,564 Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas,565 16/17 M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G.,566 Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J.,567 Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van568 Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson,569 M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft,570 K., Zhao, J., and Mons, B. (2016). The FAIR Guiding Principles for scientific data management and571 stewardship. Scientific Data, 3:160018.572 Wilson, G. (2006). Software carpentry: Getting scientists to write better code by making them more573 productive. Computing in Science & Engineering, 8(6):66–69.574 Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., and Teal, T. K. (2017). Good enough575 practices in scientific computing. PLOS Computational Biology, 13(6):1–20.576 17/17