Towards computational reproducibility: researcher perspectives on the use and sharing of software


A peer-reviewed version of this preprint was published in PeerJ on 17
September 2018.

View the peer-reviewed version (peerj.com/articles/cs-163), which is the
preferred citable publication unless you specifically need to cite this preprint.

AlNoamany Y, Borghi JA. 2018. Towards computational reproducibility:
researcher perspectives on the use and sharing of software. PeerJ Computer
Science 4:e163 https://doi.org/10.7717/peerj-cs.163

https://doi.org/10.7717/peerj-cs.163
https://doi.org/10.7717/peerj-cs.163


Towards computational reproducibility: researcher

perspectives on the use and sharing of software

Yasmin Alnoamany Corresp.,   1  ,  John A. Borghi  2 

1
 University of California, Berkeley, United States

2
 California Digital Library

Corresponding Author: Yasmin Alnoamany

Email address: yasminal@berkeley.edu

Research software, which includes both the source code and executables used as part of

the research process, presents a significant challenge for efforts aimed at ensuring

reproducibility. In order to inform such efforts, we conducted a survey to better understand

the characteristics of research software as well as how it is created, used, and shared by

researchers. Based on the responses of 215 participants, representing a range of research

disciplines, we found that researchers create, use, and share software in a wide variety of

forms for a wide variety of purposes, including data collection, data analysis, data

visualization, data cleaning and organization, and automation. More participants indicated

that they use open source software than commercial software. While a relatively small

number of programming languages (e.g. Python, R, JavaScript, C++, Matlab) are used by a

large number, there is a long tail of languages used by relatively few. Between group

comparisons revealed that significantly more participants from computer science write

source code and create executables than participants from other disciplines. Group

comparisons related to knowledge of best practices related to software creation or sharing

were not significant. While many participants indicated that they draw a distinction

between the sharing and preservation of software, related practices and perceptions were

often not aligned with those of the broader scholarly communications community.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26727v1 | CC BY 4.0 Open Access | rec: 19 Mar 2018, publ: 19 Mar 2018


Towards Computational Reproducibility:1

Researcher Perspectives on the Use and2

Sharing of Software3

Yasmin AlNoamany1 and John A. Borghi24

1University of California, Berkeley5
2California Digital Library6

Corresponding author:7

Yasmin AlNoamany18

Email address: yasminal@berkeley.edu9

ABSTRACT10

Research software, which includes both the source code and executables used as part of the research

process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform

such efforts, we conducted a survey to better understand the characteristics of research software as

well as how it is created, used, and shared by researchers. Based on the responses of 215 participants,

representing a range of research disciplines, we found that researchers create, use, and share software

in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data

visualization, data cleaning and organization, and automation. More participants indicated that they

use open source software than commercial software. While a relatively small number of programming

languages (e.g. Python, R, JavaScript, C++, Matlab) are used by a large number, there is a long

tail of languages used by relatively few. Between group comparisons revealed that significantly more

participants from computer science write source code and create executables than participants from

other disciplines. Group comparisons related to knowledge of best practices related to software creation

or sharing were not significant. While many participants indicated that they draw a distinction between

the sharing and preservation of software, related practices and perceptions were often not aligned with

those of the broader scholarly communications community.

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

1 INTRODUCTION26

Research software is a important consideration when addressing concerns related to reproducibility (Hong27

(2011); Hong (2014); Stodden et al. (2014); Goble (2014)). Effective management and sharing of software28

saves time, increases transparency, and advances science (Prlić and Procter (2012)). At present, there are29

several converging efforts to ensure that software is positioned as a “first class” research object that is30

maintained, assessed, and cited in a similar fashion as scholarly publications (e.g. NIH (2016); Katz et al.31

(2013); Ram et al. (2017); Crouch et al. (2013)). However, while there is a burgeoning literature exploring32

the actives of researchers in relation to materials like data (Tenopir et al. (2015); Tenopir et al. (2015);33

Monteith et al. (2014); Kim and Stanton (2016)), those related to software have received less attention.34

Specifically, we have been unable to find a study that thoroughly examines how researchers use, share and35

value their software.36

In this paper we report the results of a survey designed to capture researcher practices and perceptions37

related to software. Survey questions addressed a variety of topics including:38

1. What are the characteristics of research software?39

2. How do researchers use software?40

3. To what extent do current practices related to software align with those related to reproducibility?41

4. How do researchers share software?42


5. How do researchers preserve software?43

After filtering, 215 researchers participated in our survey. Overall, our results demonstrate that44

researchers create software using a wide variety of programming languages, use software for a wide45

variety of purposes, have adopted some- but not all- practices recommended to address reproducibility,46

often share software outside of traditional scholarly communication channels, and generally do not47

actively preserve their software. Participants from computer science reported that they write source code48

and create executables significantly more than participants from other disciplines. However, other group49

comparisons largely did not reach statistical significance.50

In the following sections, we provide a more detailed description of our findings. We start with51

an overview of the related literature (Section 2) then a description of our survey instrument and the52

demographic characteristics of our participants (Section 3; Section 4). In section 5, we describe our53

findings related to the characteristics of research software and its usage. Responses to questions involving54

reproducibility-related practices are detailed in Section 6. Section 7 outlines the responses to questions55

related to software sharing and preservation. We discuss the implications of our findings in Section 8.56

Finally, Section 9 contains a discussion of future work.57

2 RELATED WORK58

While there is an emerging body of research examining researcher practices, perceptions, and priorities59

for products like data (Fecher et al. (2015); Kratz and Strasser (2015); Tenopir et al. (2011); Tenopir60

et al. (2015)), work related to software has generally focused on how it is found, adopted, and credited61

(Howison and Bullard (2015b); Hucka and Graham (2016); Joppa et al. (2013)). For example, research62

examining the re-use of software demonstrates that the most common difficulty for users looking for63

software is a lack of documentation and that finding software is a difficult task even within technology64

companies (Sadowski et al. (2015)). However, as software is increasingly central to the research process65

(Borgman et al. (2012)), understanding its characteristics, its use, and the related practices and perceptions66

of researchers is an essential component of addressing reproducibility.67

The term “reproducibility” has been applied to a variety of efforts aimed at addressing the misalignment68

between good research practices, including those emphasizing transparency and methodological rigor,69

and the academic reward system, which generally emphasizes the publication of novel and positive70

results (Nosek et al. (2012); Munafò et al. (2017)). Attempts to provide a cohesive lexicon for describing71

reproducibility-related activities are described elsewhere (Goodman et al. (2016)) but computational72

reproducibility generally refers to the description and sharing of software tools and data in such a73

manner as to enable their use and evaluation by others (Stodden et al. (2013)). Efforts aimed at fostering74

computational reproducibility are often focused on the sharing of source code but may also include the75

establishment of best practice guidelines related to how software tools are described, cited, and licensed76

(e.g. Stodden et al. (2016)).77

Because of the costs of irreproducibility, there have been numerous calls urging researchers to more78

thoroughly describe and share their software (Barnes (2010); Ince et al. (2012); Joppa et al. (2013);79

Morin et al. (2012a)). Such calls are increasingly backed by mandates from funding agencies. For80

example, the Wellcome Trust now expects that grant recipients make available “any original software81

that is required to view datasets or to replicate analyses” (Wellcome (2017)). In parallel, a myriad of82

guidelines, tools, and organizations have emerged to help researchers address issues related to their83

software. Software-related best practices have been outlined for both individuals working in specific84

research disciplines (Eglen et al. (2017); Marwick (2017)) and for the research community in general85

(e.g. Piccolo and Frampton (2016); Sandve et al. (2013); Jimenez et al. (2017)). Literate programming86

tools such as Jupyter notebooks (Perez and Granger (2007)) allow researchers to combine data, code,87

comments, and outputs (e.g., figures and tables) in a human-readable fashion, while packaging and88

containerization platforms such as ReproZip (Chirigati et al. (2013)) and Docker (Boettiger (2015)) enable89

the tracking, bundling, and sharing of all of the software libraries and dependencies associated with90

a research project. Through their integration with Github (https://github.com/), services like91

Figshare (https://figshare.com/) and Zenodo (https://zenodo.org/) allow researchers92

to deposit, archive, and receive persistent identifiers for their software. Training researchers to better93

develop, use, and maintain software tools is the primary focus of community organizations including94

The Carpentries (Wilson (2006); Teal et al. (2015)) and the Software Sustainability Institute (Crouch95

2/17


et al. (2013)) while scholarly communications-focused organizations such as Force11 have published96

guidelines for describing and citing software (Smith et al. (2016)).97

As is evident in the above description, reproducibility-related efforts involving software often, but not98

always, overlap with those related to data. However, software presents a number of unique challenges99

compared to data and other research products. Even defining the bounds of the term “software” is100

challenging. For example, the National Institute of Standards and Technology (NIST) defines software as101

“Computer programs and associated data that may be dynamically written or modified during execution.”102

(Kissel et al. (2011)), a definition that is as recursive as it is potentially confusing for researchers without a103

background in computer science or software development. Software involves highly interdependent source104

and binary components that are sensitive to changes in operating environment and are difficult to track105

(Thain et al. (2015)). Evaluating the validity and reliability of software often requires inspecting source106

code, which is not possible when proprietary licenses are applied (Morin et al. (2012b); Stodden (2009)).107

Even when source code is technically available, important information about versions, parameters, and108

runtime environments is often missing from the scholarly record (Howison and Bullard (2015b); Pan et al.109

(2016); Stodden et al. (2013)). Seemingly small alterations, even for well described and openly available110

software tools, can lead to significant effects on analytical outputs (McCarthy et al. (2014)), a problem111

exacerbated by the fact that researchers often have minimal formal training in software development112

practices (Hannay et al. (2009); Joppa et al. (2013); Prabhu et al. (2011)). The iterative and collaborative113

nature of software development also means that it does not fit easily within existing academic incentive114

structures (Hafer and Kirkpatrick (2009); Howison and Herbsleb (2011); Howison and Herbsleb (2013)).115

Research software is a growing concern for research service providers, including those affiliated with116

academic institutions. Often through workshops facilitated by The Carpentries, many have begun to117

provide guidance and training to researchers looking to create and use software tools. Services related118

to the preservation of software have also been explored by some academic libraries (e.g. Rios (2016)).119

However, these activities remain relatively nascent and it is presently unclear what a mature set of120

services related to research software and computational reproducibility might look like. By identifying121

the characteristics or research software, its uses, and elucidating the related practices and perceptions of122

researchers, we hope to establish a benchmark that can be applied to inform the development of such123

services in the future.124

3 METHODS125

In order to understand researcher practices and perceptions related to software and computational repro-126

ducibility, we designed and disseminated an online survey via the Qualtrics platform (www.qualtrics.127

com). The survey was advertised through blog posts, social media, and research-related email lists and128

listservs. Because the survey was distributed using different communication channels, we could not129

calculate the response rate. In Section 4, we detail the demographics of the survey’s participants.130

All study materials and procedures were approved by the University of California Berkeley Committee131

for Protection of Human Subjects and Office for the Protection of Human Subjects (protocol ID 2016-132

11-9358). The full text of the survey can be found in the supplementary materials. Before beginning133

the survey, participants were required to read and give their informed consent to participate. After134

reading the informed consent form (see survey), participants indicated their consent by checking a135

box. Information from participants who did not check this box was removed from all subsequent136

analyses. An anonymized version of our survey results (AlNoamany and Borghi (2018a)) as well137

as the code we used for its analysis (AlNoamany and Borghi (2018b)) are also available on Github138

(https://github.com/yasmina85/swcuration).139

3.1 Survey Design140
The survey was developed to capture a broad range of information about how researchers use, share, and141

value their software. The final survey instrument consisted of 56 questions (53 multiple choices, 3 open142

response), divided into four sections. In order, the sections focused on:143

1. Demographics: Included questions related the participant’s research discipline, role, degree, age,144

institution, and funding sources (7 questions)145

2. Characteristics of research software: Included questions related to how the participants use software146

and the characteristics of their software (17 questions).147

3/17


3. Software sharing practices: Included questions related to how participants make their software148

available to others (18 questions).149

4. How researchers assign value to software (14 questions).150

Because only sections 2 and 3 addressed topics related to computational reproducibility, this paper151

is focused on responses to questions in the first three questions. Future work will further delineate how152

researchers value software.153

We hypothesized that study participants would come to our survey with different levels of knowledge154

about software development practices and terminology. Therefore, we included a brief list of definitions in155

our survey for terms like “source code”, “executable”, and “open source software” that participants could156

refer back to at any time. Participants were not required to answer every question in order to proceed157

through the survey.158

3.2 Filtering and Exclusion Criteria159
We collected 330 responses to an online survey of software usage and sharing practices and perceptions160

from late January to early April of 2017. We excluded participants who started the survey but did not161

answer questions beyond the demographic section to have 215 unique responses. Though the majority of162

our participants indicated that they were from academia (Table 1), we did not exclude any participant163

due to institution type because of the possibility that participants could be affiliated with an academic164

or research program while conducting work in another sector. Institution names and disciplines were165

canonicalized (e.g. UCB and uc berkeley were mapped to UC Berkeley).166

Table 1. Demographic breakdown for study participants.

Discipline Count Percentage Institution Count Percentage

Computer Science 39 18.3% Academic: Research Focused 164 77.0%

Biology 29 13.6% Academic: Teaching Focused 22 10.3%

Psychology 28 13.1% Government 13 6.1%

Engineering 13 6.1% Nonprofit 7 3.3%

Interdisciplinary Programs 12 5.6% Academic: Medical School 3 1.4%

Mathematics 12 5.6% Commercial 2 0.9%

Physics 12 5.6% Other 2 0.9%

Earth Science 9 4.2% Role Count Percentage

Library Sciences 9 4.2% Graduate Student 67 31.5%

Social Sciences 9 4.2% Postdoc 38 17.8%

others 41 19.20% Research Faculty 35 16.4%

Highest degree Count Percentage Staff 29 13.6%

Doctorate 110 51.9% Principal Investigator 25 11.7%

Masters 72 34.0% Research Assistant 10 4.7%

Bachelors 26 12.3% Undergraduate Student 2 0.9%

High school 3 1.4% Research 1 0.5%

Professional degree 1 0.5% Other 6 2.8%

4 PARTICIPANT DEMOGRAPHICS167

We asked participants about their age, professional degrees, professional title (or role) and institutional168

affiliation, institution type, and the sources of funding. The majority of these questions were multiple169

choice with an option for open response upon selecting “Other”.170

The mean and median age of our participants were 35.8 and 33 years old, respectively. Reflecting171

the ubiquity of software within the research enterprise, participants were drawn from a wide variety of172

research disciplines, institution types, and roles. As shown in Table 1, the disciplines most represented173

in our sample were computer science, biology, and psychology. The majority of our participants were174

drawn from 129 different research-focused academic institutions (including 12% out of 215 researchers175

from UC Berkeley). Table 1 also shows that participants had a range of degrees and roles, with the most176

common being doctorate (51.9%, N = 215) and graduate student (31.5%, N = 215), respectively. In terms177
of funding, the most common responses were the National Science Foundation (NSF) (16.7%, N = 215)178
and the National Institutes of Health (7.0%, N = 215).179

4/17


5 CHARACTERISTICS AND USE OF RESEARCH SOFTWARE180

Before diving deeper into how researchers use their software, we wanted to identify its characteristics.181

In this section, we describe responses to questions related to the creation and use of source code and182

executables.183

5.1 Source Code and Executables184

We asked participants about the generation and use of source code and executables (i.e. Do you write185

source code?, Do you use source code written by others?, Do you create executables?, Do you use186

executables created by others?). We found that 84.2% out of 215 responding participants write source187

code and 89.8% out of 215 use source code written by others while 53.7% out of 214 create executables188

and 80.4% use executables written by others.189

Figure 1 shows that participants from computer science were significantly more likely to write source190

code [χ 2(2,N = 215) = 8.93, p < 0.05], create executables [χ 2(2,N = 214) = 22.67, p < 0.00001],191
and use executables created by others [χ 2(2,N = 214) = 6.66, p < 0.05] than participants from other192
disciplines. Comparisons related to the use of others’ source code did not reach statistical significance193

[χ 2(2,N = 215) = 1.21, p = 0.55].194
We also asked participants about the type of software they use (i.e. Do you use commercial software195

in the course of your research? Do you use open source software in the course of your research?). As196

shown in Figure 2 more participants indicated that they use open source software (94.9%, N = 214) than197
commercial software (72.8%, N = 214).198

5.2 Programming Languages199
In order to quantify the breadth of programming languages used in a research setting, we asked participants200

about the languages they use when writing their own code. Table 2 shows the top ten languages, which201

together account for 86.4% of languages selected. The top used languages in our sample were Python, R,202

Javascript, C++, Matlab, Java, C, PHP, and Perl. Python and R were the most used languages, selected by203

64.0% and 57.0% of participants of respectively. For the most part, these results are in line with previous204

findings from Hucka and Garaham (Hucka and Graham (2016)) and also match those of a recent study205

from Stack Overflow (Inc. (2016)). In total, 52 different languages were chosen, with the most common206

responses outside of the top ten being Ruby, C#, ASP, SAS, XML, XQuery, and Julia. Quantitatively207

measuring the use programming languages in academic research is difficult due to the variability of208

reporting practices (Howison and Bullard (2015a)), but our results are largely in line the rapid ascent of R209

and Python as tools for data science.210

Table 2. The top 10 programming languages used by the researchers in our sample. A total of 214

participants answered this question. Together these languages represent 86.4% of the languages selected.

Note that participants could choose more than one language.

Language Python R SQL Javascript C++ Matlab Java C PHP PERL

Selection 137 122 60 57 54 45 35 25 25 21

Percentage 64.0% 57.0% 28.0% 26.6% 25.2% 21.0% 16.4% 11.7% 11.7% 9.8%

We also inquired about collaborative code development and the extent to which the same programming211

languages are used within a lab or a research group. Though 53.3% of participants indicated that they212

write code collaboratively, we were surprised to see that only 33.0% indicated that everyone in the lab213

used the same language(s).214

5.3 Use of Research Software215

Previous scholarship (e.g. Borgman et al. (2012)) has indicated that researchers use software for a wide216

variety of purposes. To examine the purposes of research software, we asked participants about how they217

use their code or software (Figure 1). This question allowed them to choose multiple answers from a218

suggested list or input other answers.219

Figure 3(a) shows that our participants use software primarily to analyze data, visualize data, clean220

and organize data, automate their work, and collect data. A total of 104 participants (55.7% out of 212221

participants) responded that they use software for all five. “Other” responses included running simulations,222

building models, researching algorithms, testing methods, writing compilers, and sharing and publishing223

5/17


Yes No I don't know

%

0

20

40

60

80

100

81.61

16.67

1.72

94.87

5.13

0

●

●

Non−CS

CS

(a) Do you write source code? N = 215

Yes No I don't know

%

0

20

40

60

80

100

89.66

9.2

1.15

89.74

10.26

0

●

●

Non−CS

CS

(b) Do you use source code written by others? N =
215

Yes No I don't know

%

0

20

40

60

80

100

50.29
46.82

2.89

81.58

18.42

0

●

●

Non−CS

CS

(c) Do you create executables? N = 214

Yes No I don't know

%

0

20

40

60

80

100

78.16

17.82

4.02

89.47

10.53

0

●

●

Non−CS

CS

(d) Do you use executables created by others? N =
214

Figure 1. Significantly more participants from computer science stated that they write source code,

create executables, and use executables created by others than participants from other disciplines.

Yes No I don't know

%

0

20

40

60

80

100
94.39

71.96

4.67

25.23

0.47 1.87

Open source

Commercial software

Figure 2. The use of open source software versus commercial software. N = 214.

6/17


To analyze data

To visualize data

To clean/organize data

To do automation

To collect data

Other

%

0 20 40 60 80 100

94.34

89.62

86.32

81.13

72.64

11.32

(a) How do you use code or software? N = 212

Yes No I don't know

%

0

20

40

60

80

100

82.13

15.46

2.42

(b) Have you ever repurposed your code or software?

N = 208

Figure 3. The purpose of using research software. Note that the first question could be answered with
more than one choice.

data. We also asked if researchers repurpose their code (i.e. using it for a project other than the one for224

which it was originally created) and found that 82% out of 208 participants indicated that they do that.225

We investigated how researchers collaborate on code writing within their research labs (Figure 4) (e.g.226

“Do you write code collaboratively (i.e. with another person or multiple people)?”, “Does everyone in227

your lab or research group write code using the same programming language(s)?”) We found that 49.8%228

(N = 200) of researchers write code collaboratively (Figure 4(a)), while only 30% (N = 201) use the229
same coding language in their research labs (Figure 4(b)).230

Previous studies on the reuse of research software have focused mainly on licensing, review of code,231

and user awareness (Joppa et al. (2013); Morin et al. (2012a)). Reinforcing the need to establish best232

practices (or good enough practices - e.g. Wilson et al. (2017) akin to those related to the management of233

research data, 79.8% of our participants (N = 208) indicated that they repurpose their code.234
In an open response question, we asked participants to describe, in their own words, how they use235

their software and code. Here, again, participants indicated that they use software for a wide variety of236

purposes. One participant summed the relationship between software and their research succinctly as “I237

use software for stimulus presentation, data acquisition, and data analysis and visualization - basically238

my entire research is run via computers (and thus code).” Similarly, another participant described the239

application of software within the field of computer science: “As a computer scientist, almost every aspect240

of my research from grant proposal to collecting data to analyzing data to writing up my results involves241

software. I write software. I use software my collaborators or students write as well as open source and242

commercial software.243

6 REPRODUCIBILITY-RELATED PRACTICES244

To understand how the practices of our participants align with those related to computational reproducibil-245

ity, we asked a number of questions about adding comments to source code, generating documentation,246

communicating information about dependencies, and using “notebook” applications such as Jupyter. We247

also asked about awareness of coding conventions and best practices. The results of these questions are248

shown in Figure 5.249

In line with previous research (Hannay et al. (2009); Joppa et al. (2013); Prabhu et al. (2011)),250

only 53.4% (N = 215) of our participants indicated that they have received formal training in coding251
conventions or best practices. At the same time, we found that many actually employ practices that are252

commonly cited for establishing computational reproducibility. For example, when asked “Do you include253

comments in your code?” and “When you share your code or software, do you provide information254

about dependencies?” the majority of participants (98.0%, N = 204, 72.2%, N = 169) indicated that they255
include comments and provide information about dependencies, respectively. However, substantially256

7/17


Yes No I don't know

%

0

10

20

30

40

50

60

70

33

63.5

3.5

(a) Does everyone in your lab or research group write

code using the same programming language(s)? N =
201

Yes No I don't know

%

0

10

20

30

40

50

60

70

53.27

45.73

1.01

(b) Do you write code collaboratively? N = 200

Figure 4. Consistency of programming languages within research groups.

Yes No

%

0

10

20

30

40

50

60

70

51.72

48.28

61.54

38.46

●

●

Non−CS

CS

(a) Have you received training in coding conventions

or best practices? N = 215.

Yes No I don't know

%

0

20

40

60

80

100
98.18

0.61 1.21

97.3

2.7
0

●

●

Non−CS

CS

(b) Do you include comments in your code? N = 204.

Yes No I don't know

%

0

20

40

60

80

61.45

36.75

1.81

54.05

43.24

2.7

●

●

Non−CS

CS

(c) Do you generate documentation for your code?

N = 205.

Yes No I don't know

%

0

10

20

30

40

50

60

44.91

52.69

2.4

40.54

59.46

0

●

●

Non−CS

CS

(d) Do you write code using a notebook? N = 206.

Figure 5. Reproducibility practices in research.

8/17


Yes No I don't know

%

0

20

40

60

80

100

66.15

30

3.85

68.97

31.03

0

●

●

Non−CS

CS

(a) When you share your code or software, do you

share it alongside related files (e.g. datasets)? N =
161.

Yes No I don't know

%

0

20

40

60

80

100

67.88

21.9

10.22

90

10

0

●

●

Non−CS

CS

(b) When you share your code or software, do you

provide information about dependencies? N = 169.

Figure 6. CS researchers tend to provide information about dependencies more than other disciplines.

fewer indicated that they employ other practices such as generating documentation (60.0%, N = 205).257
While electronic lab notebooks have been cited as a tool for ensuring reproducibility (Kluyver et al.258

(2016)), only 43.6% (N = 206) of our participants indicated that they use them to write code.259

Comparisons of responses by discipline (e.g. computer science versus others) or location (e.g. UC260

Berkeley versus others) were insignificant even, surprisingly, on questions related to training [discipline:261

χ
2(1,N = 215) = 1.58, p = 0.21, location: χ 2(2,N = 215) = 0.00, p = 1.00] (Figure 5). The lone262

exception was in providing information about dependencies. Significantly more respondents from263

computer science reported that they include information about dependencies when they share their code264

than participants from other disciplines [χ 2(2,N = 169) = 17.755, p < 0.001].265

7 SHARING AND PRESERVATION OF THE RESEARCH SOFTWARE266

Making materials available for others to evaluate, use, and build upon is an essential component of267

ensuring reproducibility. Much of the previous work examining the sharing of research software has268

focused on the degree to which software is cited and described irregularly in the scholarly literature269

(Howison and Bullard (2015a); Smith et al. (2016)) and the relationship between code sharing and research270

impact (Vandewalle (2012)). In order to gain a greater understanding of how sharing practices relate to271

reproducibility, we asked our participants a variety of questions about how they share, find, and preserve272

software.273

7.1 Sharing Research Software274

Sharing Practices275

While only half (50.5%, N = 198) of our participants indicated that they were aware of related community276
standards in their field or discipline, the majority indicated that they share software as part of the research277

process (computer science: 84.9%, other disciplines: 81.1% for N = 187) (Figure 7). Of 189 participants,278
31% indicated that there were reasons their software could not be shared (Figure 7(b)). The most279

commonly cited restrictions on sharing were the inclusion of sensitive data, intellectual property concerns,280

and the time needed to prepare code for sharing. Comparisons between computer science and other281

disciplines on the sharing of code were not statistically significant [χ 2(2,N = 187) = 1.5842, p > 0.4529].282

We also checked if participants share new versions of their code and found that 81% (N = 156)283
do so using a version control system. A group comparison related to the sharing of new versions was284

not statistically significant [CS vs non-CS: χ 2(2,N = 156) = 2.2, p > 0.05] (Figure 7(c)), however285
significantly more participants from computer science indicated that they share their codes via a version286

control system than those from other disciplines [χ 2(2,N = 185) = 16.4, p < 0.05] (Figure 7(d)).287

9/17


Yes No I don't know

%

0

20

40

60

80

100

81.05

17.65

1.31

84.85

15.15

0

●

●

Non−CS

CS

(a) Do you share the code or software created as part

of your research? N = 187.

Yes No I don't know

%

0

20

40

60

80

100

31.61

64.52

3.87

30.3

66.67

3.03

●

●

Non−CS

CS

(b) Is there any reason your code or software could

not be shared? N = 189.

Yes No I don't know

%

0

20

40

60

80

100

86.61

7.87
5.51

92.86

3.57 3.57

●

●

Non−CS

CS

(c) If you make a change to your code, do you share

a new version? N = 156.

Yes No I don't know

%

0

20

40

60

80

100

78.15

18.54

3.31

96.97

3.03
0

●

●

Non−CS

CS

(d) Do you use a version control system (e.g. Git,

SVN)? N = 185.

Figure 7. Practices of code sharing.

10/17


Directly via e−mail

In a scholarly publication

Through posts on my website/lab
website

Through social media

In a Software or Data paper

Through online communities

Other

%

0 10 30 50 70

61.66

54.4

43.01

37.31

17.1

14.51

16.58

(a) How do you tell people about the code or software you’ve

shared? N = 165.

Source Code Executable code Both

0

20

40

60

80
75.29

7.65

17.06

(b) In what format do you typically share your code? N =
175.

Figure 8. Methods and formats for sharing software. Note that both of these questions could be
answered with more than one response.

Sharing Format and Platform288

We asked our participants about how they share their code and found that 75.3% of 175 participants289

share their software in the form of source code, 7.6% share executables only, and 17.1% share both290

formats (Figure 8). As shown in Figure 8(a), participants indicated that they share their software through291

a variety of channels, with the most common being e-mail. The figure shows that 73.94% of the time292

our participants make their code available through direct communication and 50% make their code293

available through social media platforms. The participants who indicated that they use methods other than294

those listed in our survey generally responded that they do so using platforms such Github or the Open295

Science Framework. A few researchers mentioned that they save their code along with the dataset in their296

institutional repository, while others indicated that they publicize their code via conferences.297

7.2 Preserving Research Software298

We asked variety of questions about preserving research software (i.e. Do you take any steps to ensure299

that your code or software is preserved over the long term?, How long do you typically save your code or300

software?, and Where do you save your code or software so that it is preserved over the long term?). While301

research software is a building block for ensuring the reproducibility, 39.9% of participants (N = 183) do302
not prepare their code for long-term preservation.303

How long do you typically save your code or software?304

Figure 9(a) shows that the majority of our participants (40.4% out of 162) preserve their code for more than305

eight years, but generally not in a way that maintains its use. In contrast, 7.4% (N = 162) of participants306
keep their codes until it is described in a publication, poster, or presentation. We found 10.5% out of 162307

researchers tend to keep their codes 3 years or less and 19.8% tend to keep their codes 4-8 year. Only308

21.0% out of 162 researchers tend to keep their codes for 8 years or more with maintaining their codes for309

future access and use.310

Where do you save your code or software so that it is preserved over the long term?311

In terms of where our participants preserve their code, Figure 9(b) shows that 76.2% of the time participants312

use code hosting sites such as Github. About 56.4% of the time, researchers use hard drives or external313

storage to preserve their codes and 38.1% of the time they preserve their codes by putting them on314

the cloud. Only 12.7% of our participants indicated that they use archival repositories (e.g. figshare).315

The participants who entered “other” responses mentioned that they use a backup system of their lab,316

organization archive (e.g., University server), their own PC, language package registry (CRAN, PyPi or317

similar), Internal SVN repository, or project specific websites.318

11/17


More than 8 years without

maintaining

More than 8 years and maintained

4−8 years

0−3 years

Until it is described in a

publication

%

0 10 20 30 40 50

41.4

21

19.8

10.5

7.4

(a) How long do you typically save your code or software?

N = 162.

On a code hosting site

On a hard drive/external storage

In the cloud

On my website

In an archival repository

Other

In a discipline specific index or

registry

%

0 20 40 60 80

76.2

56.4

38.1

20.4

12.7

10.5

3.9

(b) Where do you save your code or software so that it is

preserved over the long term? N = 182.

Figure 9. 76.2% of researchers use Github for preserving their codes. Note that the second question

could be answered with more than one choice.

We asked participants to define sharing and preserving in their own words. Their responses generally319

indicated that they make a distinction between the two concepts. For example, one participant stated320

succinctly, “sharing is making code available to others, in a readily usable form. Preserving is ensuring321

to the extent practical that the code will be usable as far into the future as possible.” However, several322

responses indicated that participants did not necessarily regard preservation as an active process that323

continues even after the conclusion of a particular project (e.g. “sharing means giving access to my code324

to someone else. Preserving means placing my code somewhere where it can remain and I will not delete325

it to save room or lose it when I switch computers or suffer a hard drive failure.”. In contrast, other326

responses indicated that participants were aware that preservation is important for reuse purpose and had a327

knowledge of preservation tools. For example, one researcher defined preserving software as, “branching328

so that code remains compatible with different versions of overarching libraries (in my case) or with329

new coding standards and compilers”. and another stated “Preserving should be done via a system like330

LOCKSS that ensures that provides for redundancy. Sharing can be done via the web, but must include a331

license so that recipients know about their rights.”332

8 DISCUSSION333

Scholars throughout the humanities and sciences depend on software for a wide variety of purposes,334

including the collection, analysis, and visualization of data (Borgman et al. (2012); Hey et al. (2009)).335

Though ubiquitous, software presents significant challenges to efforts aimed at ensuring reproducibility.336

Our results demonstrate that researchers not only create and use software in a wide variety of forms and337

for a wide variety of purposes, but also that their software-related practices are often not completely in338

line with those associated with reproducibility. In particular, our results demonstrate that, while scholars339

often save their software for long periods of time, many do not actively preserve or maintain it. This340

perspective is perhaps best encapsulated by one of our participants who, when completing our open341

response question about the definition of sharing and preserving software, wrote “Sharing means making342

it publicly available on Github. Preserving means leaving it on GitHub”. We share this anecdote not343

to criticize our participants or their practices, but to illustrate the outstanding need for support services344

related to software.345

In the broader scholarly communications space, there are several prominent frameworks that relate to346

the reproducibility of scholarly outputs. As part of an effort to advance data as a “first class” research347

product, the FAIR (Findable, Accessible, Interoperable, and Reusable) guidelines provide a measurable348

set of principles related to the management and sharing of research data (Wilkinson et al. (2016)).349

12/17


The FAIR principles are general enough that they can, with some modification, also be applied to350

software (Jimenez et al. (2017)). At the level of scholarly publications, the TOP (Transparency and351

Openness Promotion) guidelines (Nosek et al. (2015)) addresses citation standards and the availability of352

research materials including data and software. A supplement to TOP, the Reproducibility Enhancement353

Principles (REP) (Stodden et al. (2016)) specifically targets disclosure issues related to computation354

and software. However, our results support previous work indicating that software still mostly exists355

outside the reputation economy of science (Howison and Herbsleb (2011)) which indicates that a more356

education-based approach, that provides guidance about software before the publication stage is necessary.357

The majority of our participants indicated that view code or software as “first class” research product,358

that should be assessed, valued, and shared in the same way as a journal article. However, our results359

also indicate that there remains a significant gap between this perception and actual practice. The fact360

that our participants indicated that they create and use software in a wide variety of forms and for a wide361

variety of purposes demonstrates the significant technical challenges inherent in ensuring computational362

reproducibility. In contrast, the lack of active preservation and tendency to share software outside363

traditional (and measurable) scholarly communications channels displayed by our sample demonstrates364

the social and behavioral challenges. A significant difficulty in ensuring computational reproducibility is365

that researchers oftentimes do not treat their software as a “first class” research product. These findings366

reinforce the need for programs to train researchers on how to maintain their code in the active phase of367

their research.368

At present, there are a number of initiatives focused on addressing the preservation and reproducibility369

of software. In the United States, the Software Preservation Network (SPN) (Meyerson et al. (2017))370

represents an effort to coordinate efforts to ensure the long-term access to software. The focus of SPN is371

generally on cultural heritage software rather than research software, but their work delineating issues372

related to metadata, governance, and technical infrastructure has substantial overlap with what is required373

for research software. In the United Kingdom, the Software Sustainability Institute trains researchers374

on how to develop better software and make better use of the supporting infrastructure (Crouch et al.375

(2013)). Befitting the necessity of training and preservation indicated by our study, a similar effort, the376

US Software Sustainability Initiative was recently awarded funding by the National Science Foundation377

(NSF Award #1743188). While it is likely not possible for academic institutions to offer support services378

that cover the broad range of programming languages and applications described in our survey results,379

collaborating with such groups to create guidance and best practice recommendations may a feasible first380

step in engaging with researchers about their software and code in the same manner as many research381

data management (RDM) initiatives now engage with them about their data.382

While research stakeholders including academic institutions, publishers, and funders have an interest383

in tackling issues of computational reproducibility in order to ensure the integrity of the research process,384

our results demonstrate the complexity of doing so. One participant summed up why their code could not385

be made re-usable: “Most of my coding is project specific and not reusable between projects because the386

datasets I encounter are very variable. I typically only generate packages for tasks such as getting data387

from a database (e.g., PubMed) and keeping RMarkdown templates in an orderly way.”388

9 CONCLUSION AND FUTURE WORK389

In this paper, we introduced the results of surveying researchers across different institution on software390

usage, sharing, and preservation. We also checked the practices used to manage software for ensuring391

the reproducibility and integrity of the scientific research. Our results point to several interesting trends392

including the widespread writing of source code and use of source code written by others, the variety393

of programming languages used and the lack of consistency even within the same lab or research394

group, the use of open source software over commercial software, and the adoption of some practices395

assure computational reproducibility, such as adding comments and documentation to code, but not others,396

specifically the general lack of active preservation. The findings of this paper inform ongoing conversations397

about research software and reproducibility on the current practices around research software. This will398

help service providers to deliver the right tools and systems that help researchers to manage their code399

and help in ensuring the integrity of the reproducibility in the scholarly ecosystem.400

The present study was designed to capture a broad picture of how researchers use and share their401

software. For this reason, we were not able to provide a particularly granular picture of how individual402

practices relate to reproducible science outcomes. For example, while the majority of our participants403

13/17


responded that they include comments in their source code and generate documentation for their software,404

we were not able to make any judgment about whether or not the contents of these comments and405

documentation are sufficient to ensure reproducibility. Follow up research is needed in order to gain a406

more nuanced understanding of how processes related to the creation and use of research software relate407

to reproducibility. However, despite these limitations, our results indicate several potential directions for408

future library services centered on helping researchers create, use, and share their software and assure409

computational reproducibility.410

ACKNOWLEDGMENTS411

We would like to thank our colleagues at UC Berkeley Library and California Digital Library for their412

valuable suggestions and insightful comments throughout this project.413

REFERENCES414

AlNoamany, Y. and Borghi, J. A. (2018a). Data: Researcher perspectives on the use and sharing of415

software.416

AlNoamany, Y. and Borghi, J. A. (2018b). Software study code.417

Barnes, N. (2010). Publish your computer code: it is good enough. Nature, 467(7317):753–753.418

Boettiger, C. (2015). An introduction to docker for reproducible research. ACM SIGOPS Operating419

Systems Review, 49(1):71–79.420

Borgman, C. L., Wallis, J. C., and Mayernik, M. S. (2012). Who’s got the data? interdependencies in421

science and technology collaborations. Computer Supported Cooperative Work (CSCW), 21(6):485–422

523.423

Chirigati, F., Shasha, D., and Freire, J. (2013). Reprozip: Using provenance to support computational424

reproducibility. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance,425

TaPP ’13, pages 1–4. USENIX Association.426

Crouch, S., Hong, N. C., Hettrick, S., Jackson, M., Pawlik, A., Sufi, S., Carr, L., Roure, D. D., Goble, C.,427

and Parsons, M. (2013). The software sustainability institute: Changing research software attitudes and428

practices. Computing in Science & Engineering, 15(6):74–80.429

Eglen, S. J., Marwick, B., Halchenko, Y. O., Hanke, M., Sufi, S., Gleeson, P., Silver, R. A., Davison, A. P.,430

Lanyon, L., Abrams, M., Wachtler, T., Willshaw, D. J., Pouzat, C., and Poline, J.-B. (2017). Toward431

standard practices for sharing computer code and programs in neuroscience. Nature Neuroscience,432

20:770.433

Fecher, B., Friesike, S., and Hebing, M. (2015). What drives academic data sharing? PLOS ONE,434

10(2):1–25.435

Goble, C. (2014). Better software, better research. IEEE Internet Computing, 18(5):4–8.436

Goodman, S. N., Fanelli, D., and Ioannidis, J. P. A. (2016). What does research reproducibility mean?437

Science Translational Medicine, 8(341):341ps12–341ps12.438

Hafer, L. and Kirkpatrick, A. E. (2009). Assessing open source software as a scholarly contribution.439

Communications of the ACM, 52(12):126–129.440

Hannay, J. E., MacLeod, C., Singer, J., Langtangen, H. P., Pfahl, D., and Wilson, G. (2009). How do441

scientists develop and use scientific software? In Proceedings of the 2009 ICSE Workshop on Software442

Engineering for Computational Science and Engineering, SECSE ’09, pages 1–8. IEEE Computer443

Society.444

Hey, T., Tansley, S., and Tolle, K. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery.445

Microsoft Research.446

Hong, N. C. (2011). Digital preservation and curation: the danger of overlooking software. The447

Preservation of Complex Objects, page 25.448

Hong, N. C. (2014). Dealing with software: the research data issues.449

Howison, J. and Bullard, J. (2015a). How is software visible in the scientific literature. Technical report,450

Technical report, Univ. of Texas.451

Howison, J. and Bullard, J. (2015b). Software in the scientific literature: Problems with seeing, finding,452

and using software mentioned in the biology literature. Journal of the Association for Information453

Science and Technology.454

Howison, J. and Herbsleb, J. D. (2011). Scientific software production: Incentives and collaboration.455

14/17


In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW ’11,456

pages 513–522. ACM.457

Howison, J. and Herbsleb, J. D. (2013). Incentives and integration in scientific software production. In458

Proceedings of the ACM 2013 Conference on Computer Supported Cooperative Work, CSCW ’13,459

pages 459–470. ACM.460

Hucka, M. and Graham, M. J. (2016). Software search is not a science, even among scientists. CoRR,461

abs/1605.02265.462

Inc., S. E. (2016). Developer survey results 2016.463

Ince, D. C., Hatton, L., and Graham-Cumming, J. (2012). The case for open computer programs. Nature,464

482:485.465

Jimenez, R., Kuzak, M., Alhamdoosh, M., Barker, M., Batut, B., Borg, M., Capella-Gutierrez, S.,466

Chue Hong, N., Cook, M., Corpas, M., Flannery, M., Garcia, L., GelpÌ, J., Gladman, S., Goble, C.,467

Gonz·lez Ferreiro, M., Gonzalez-Beltran, A., Griffin, P., Gr¸ning, B., Hagberg, J., Holub, P., Hooft,468

R., Ison, J., Katz, D., Leskoek, B., Lupez Gumez, F., Oliveira, L., Mellor, D., Mosbergen, R., Mulder,469

N., Perez-Riverol, Y., Pergl, R., Pichler, H., Pope, B., Sanz, F., Schneider, M., Stodden, V., Suchecki,470

R., Svobodov· Va?ekov·, R., Talvik, H., Todorov, I., Treloar, A., Tyagi, S., van Gompel, M., Vaughan,471

D., Via, A., Wang, X., Watson-Haigh, N., and Crouch, S. (2017). Four simple recommendations472

to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research,473

6(876).474

Joppa, L. N., McInerny, G., Harper, R., Salido, L., Takeda, K., O’Hara, K., Gavaghan, D., and Emmott, S.475

(2013). Troubling trends in scientific software use. Science, 340(6134):814–815.476

Katz, D. S., Allen, G., Hong, N. C., Parashar, M., and Proctor, D. (2013). First workshop on sustainable477

software for science: Practice and experiences (wssspe): Submission and peer-review process, and478

results. arXiv preprint arXiv:1311.3523.479

Kim, Y. and Stanton, J. M. (2016). Institutional and individual factors affecting scientists’ data-sharing480

behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology,481

67(4):776–799.482

Kissel, R., Kissel, R., Blank, R., and Secretary, A. (2011). Glossary of key information security terms. In483

NIST Interagency Reports NIST IR 7298 Revision 1, National Institute of Standards and Technology.484

Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.,485

Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., and development team [Unknown],486

J. (2016). Jupyter notebooks: A publishing format for reproducible computational workflows. In487

Loizides, F. and Scmidt, B., editors, Positioning and Power in Academic Publishing: Players, Agents488

and Agendas, pages 87–90. IOS Press.489

Kratz, J. E. and Strasser, C. (2015). Researcher perspectives on publication and peer review of data. PLOS490

ONE, 10(2):1–21.491

Marwick, B. (2017). Computational reproducibility in archaeological research: Basic principles and a492

case study of their implementation. Journal of Archaeological Method and Theory, 24(2):424–450.493

McCarthy, D. J., Humburg, P., Kanapin, A., Rivas, M. A., Gaulton, K., Cazier, J.-B., and Donnelly, P.494

(2014). Choice of transcripts and software has a large effect on variant annotation. Genome Medicine,495

6(3):26.496

Meyerson, J., Vowell, Z., Hagenmaier, W., Leventhal, A., Roke, E. R., Rios, F., and Walsh, T. (2017). The497

Software Preservation Network (SPN): A Community Effort to Ensure Long Term Access to Digital498

Cultural Heritage. D-Lib Magazine, 23(5/6).499

Monteith, J. Y., McGregor, J. D., and Ingram, J. E. (2014). Scientific research software ecosystems. In500

Proceedings of the 2014 European Conference on Software Architecture Workshops, ECSAW ’14,501

pages 9:1–9:6. ACM.502

Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., and Sliz, P. (2012a). Shining light into503

black boxes. Science, 336(6078):159–160.504

Morin, A., Urban, J., and Sliz, P. (2012b). A quick guide to software licensing for the scientist-programmer.505

PLOS Computational Biology, 8(7):1–7.506

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N.,507

Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., and Ioannidis, J. P. A. (2017). A manifesto for508

reproducible science. Nature Human Behaviour, 1(1):0021.509

NIH (2016). Strategies for nih data management, sharing, and citation.510

15/17


Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers,511

C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R.,512

Goroff, D., Green, D. P., Hesse, B., Humphreys, M., Ishiyama, J., Karlan, D., Kraut, A., Lupia,513

A., Mabry, P., Madon, T., Malhotra, N., Mayo-Wilson, E., McNutt, M., Miguel, E., Paluck, E. L.,514

Simonsohn, U., Soderberg, C., Spellman, B. A., Turitto, J., VandenBos, G., Vazire, S., Wagenmakers,515

E. J., Wilson, R., and Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242):1422–516

1425.517

Nosek, B. A., Spies, J. R., and Motyl, M. (2012). Scientific utopia: Ii. restructuring incentives and518

practices to promote truth over publishability. Perspectives on Psychological Science, 7(6):615–631.519

Pan, X., Yan, E., and Hua, W. (2016). Disciplinary differences of software use and impact in scientific520

literature. Scientometrics, 109(3):1593–1610.521

Perez, F. and Granger, B. E. (2007). Ipython: A system for interactive scientific computing. Computing in522

Science Engineering, 9(3):21–29.523

Piccolo, S. R. and Frampton, M. B. (2016). Tools and techniques for computational reproducibility.524

GigaScience, 5(1):30.525

Prabhu, P., Jablin, T. B., Raman, A., Zhang, Y., Huang, J., Kim, H., Johnson, N. P., Liu, F., Ghosh, S.,526

Beard, S., Oh, T., Zoufaly, M., Walker, D., and August, D. I. (2011). A survey of the practice of527

computational science. In State of the Practice Reports, SC ’11, pages 19:1–19:12. ACM.528

Prlić, A. and Procter, J. B. (2012). Ten simple rules for the open development of scientific software. PLoS529

Comput Biol, 8(12):e1002802.530

Ram, K., Katz, D., Carver, J., Gesing, S., and Weber, N. (2017). Si2-s2i2 conceptualization: Conceptual-531

izing a us research software sustainability institute (urssi).532

Rios, F. (2016). The Pathways of Research Software Preservation: An Educational and Planning Resource533

for Service Development. D-Lib Magazine, 22(7/8).534

Sadowski, C., Stolee, K. T., and Elbaum, S. (2015). How developers search for code: a case study. In535

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pages 191–201.536

ACM.537

Sandve, G. K., Nekrutenko, A., Taylor, J., and Hovig, E. (2013). Ten simple rules for reproducible538

computational research. PLOS Computational Biology, 9(10):1–4.539

Smith, A. M., Katz, D. S., and Niemeyer, K. E. a. (2016). Software citation principles. PeerJ Computer540

Science, 2:e86.541

Stodden, V. (2009). The legal framework for reproducible scientific research: Licensing and copyright.542

Computing in Science & Engineering, 11(1):35–40.543

Stodden, V., Guo, P., and Ma, Z. (2013). Toward reproducible computational research: An empirical544

analysis of data and code policy adoption by journals. PLOS ONE, 8(6):1–8.545

Stodden, V., Leisch, F., and Peng, R. D. (2014). Implementing reproducible research. CRC Press.546

Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., Heroux, M. A., Ioannidis, J. P.,547

and Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354(6317):1240–548

1241.549

Teal, T. K., Cranston, K. A., Lapp, H., White, E., Wilson, G., Ram, K., and Pawlik, A. (2015). Data550

carpentry: workshops to increase data literacy for researchers. International Journal of Digital Curation,551

10(1):135–143.552

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., and Frame, M.553

(2011). Data sharing by scientists: practices and perceptions. PloS one, 6(6):e21101.554

Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D., and Dorsett, K. (2015).555

Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLOS556

ONE, 10(8):1–24.557

Thain, D., Ivie, P., and Meng, H. (2015). Techniques for Preserving Scientific Software Executions:558

Preserve the Mess or Encourage Cleanliness? Proceedings of the 12th International Conference on559

Digital Preservation (iPRES).560

Vandewalle, P. (2012). Code sharing is associated with research impact in image processing. Computing561

in Science Engineering, 14(4):42–47.562

Wellcome (2017). Policy on data, software and materials management and sharing.563

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N.,564

Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas,565

16/17


M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G.,566

Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J.,567

Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van568

Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson,569

M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft,570

K., Zhao, J., and Mons, B. (2016). The FAIR Guiding Principles for scientific data management and571

stewardship. Scientific Data, 3:160018.572

Wilson, G. (2006). Software carpentry: Getting scientists to write better code by making them more573

productive. Computing in Science & Engineering, 8(6):66–69.574

Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., and Teal, T. K. (2017). Good enough575

practices in scientific computing. PLOS Computational Biology, 13(6):1–20.576

17/17