Top 10 FAIR Data & Software Things


Top 10 FAIR Data & Software Things 

February 1, 2019 

Sprinters: 

Reid Otsuji, Stephanie Labou, Ryan Johnson, Guilherme Castelao, Bia Villas Boas, Anna-

Lena Lamprecht, Carlos Martinez Ortiz, Chris Erdmann, Leyla Garcia, Mateusz Kuzak, Paula 

Andrea Martinez, Liz Stokes, Natasha Simons, Tom Honeyman, Chris Erdmann, Sharyn 

Wise, Josh Quan, Scott Peterson, Amy Neeser, Lena Karvovskaya, Otto Lange, Iza 

Witkowska, Jacques Flores, Fiona Bradley, Kristina Hettne, Peter Verhaar, Ben Companjen, 

Laurents Sesink, Fieke Schoots, Erik Schultes, Rajaram Kaliyaperumal, Erzsebet Toth-Czifra, 

Ricardo de Miranda Azevedo, Sanne Muurling, John Brown, Janice Chan, Lisa Federer, 

Douglas Joubert, Allissa Dillman, Kenneth Wilkins, Ishwar Chandramouliswaran, Vivek 

Navale, Susan Wright, Silvia Di Giorgio, Akinyemi Mandela Fasemore, Konrad Förstner, Till 

Sauerwein, Eva Seidlmayer, Ilja Zeitlin, Susannah Bacon, Chris Erdmann, Katie Hannan, 

Richard Ferrers, Keith Russell, Deidre Whitmore, and Tim Dennis. 

Organisations: 

Library Carpentry/The Carpentries, Australian Research Data Commons, Research Data 

Alliance Libraries for Research Data Interest Group, FOSTER Open Science, OpenAire, 

Research Data Alliance Europe, Data Management Training Clearinghouse, California Digital 

Library, Dryad, AARNet, Center for Digital Scholarship at the Leiden University, DANS, The 

Netherlands eScience Center, University Utrecht, UC San Diego, Dutch Techcentre for Life 

Sciences, EMBL, University of Technology, Sydney, UC Berkeley, University of Western 

Australia, Leiden University, GO FAIR, DARIAH, Maastricht University, Curtin University, 

NIH, NLM, NCBI, ZB MED, CSIRO, and UCLA. 

https://orcid.org/0000-0002-1842-0295
https://orcid.org/0000-0001-5633-5983
https://library.ucsd.edu/about/contact-us/librarians-and-subject-specialists/ryan-johnson.html
https://scripps.ucsd.edu/profiles/gpimentacastelao
http://orcid.org/0000-0001-6767-6556
https://www.uu.nl/staff/ALLamprecht
https://www.uu.nl/staff/ALLamprecht
https://github.com/c-martinez
https://github.com/libcce
https://github.com/ljgarcia
https://github.com/mkuzak
https://github.com/orchid00/
https://github.com/orchid00/
https://twitter.com/ragamouf
https://twitter.com/n_simons
https://www.linkedin.com/in/tom-honeyman-478277151/
https://twitter.com/libcce
https://orcid.org/0000-0001-7677-780X
https://orcid.org/0000-0001-7677-780X
https://github.com/wrathofquan
https://github.com/scottcpeterson
https://twitter.com/pseudoamyloid
https://www.uu.nl/medewerkers/EKarvovskaya
https://www.uu.nl/staff/OALange
https://www.uu.nl/staff/IMWitkowska
https://www.uu.nl/staff/IMWitkowska
https://www.uu.nl/staff/JPFlores
http://orcid.org/0000-0002-3622-2794
https://twitter.com/kristinahettne
https://twitter.com/pverhaar?lang=en
https://www.universiteitleiden.nl/en/staffmembers/ben-companjen
https://www.universiteitleiden.nl/en/staffmembers/laurents-sesink#tab-1
https://www.universiteitleiden.nl/en/staffmembers/fieke-schoots#tab-1
https://orcid.org/0000-0001-8888-635X
https://www.lumc.nl/org/humane-genetica/medewerkers/rajaram-kaliyaperumal?setlanguage=English&setcountry=en
https://openmethods.dariah.eu/erzsebet-toth-czifra/
https://www.linkedin.com/in/ricardo-de-miranda-azevedo-b0b95b26/
https://www.universiteitleiden.nl/en/staffmembers/sanne-muurling#tab-1
https://staffportal.curtin.edu.au/staff/profile/view/John.Brown
https://github.com/icecjan/
https://github.com/informationista
https://github.com/doujouDC
https://twitter.com/dchackathons
https://www.niddk.nih.gov/about-niddk/staff-directory/biography/wilkins-kenneth
https://www.linkedin.com/in/ishwarc/
https://www.rd-alliance.org/users/vivek-navale
https://www.rd-alliance.org/users/vivek-navale
https://www.drugabuse.gov/about-nida/organization/divisions/division-basic-neuroscience-behavioral-research-dbnbr/office-director-od
https://twitter.com/digiorgiosilvia
https://sea-region.github.com/fasemoreakinyemi
https://twitter.com/konradfoerstner
https://twitter.com/TillSauerwein
https://twitter.com/TillSauerwein
https://sea-region.github.com/EvaSeidlmayer
https://rd-alliance.org/users/ilja-zeitlin
https://twitter.com/ardcsbacon
https://twitter.com/libcce
http://orcid.org/0000-0002-5689-4133
https://twitter.com/valuemgmt
https://www.rd-alliance.org/users/kgrussell
https://github.com/deidrewhitmore
https://github.com/jt14den


2 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Table of Contents 

About, p. 3 

Oceanography, p. 4 

Research Software, p. 15 

Research Libraries, p. 20

Research Data Management Support, p. 25 

International Relations, p. 30 

Humanities: Historical Research, p. 34 

Geoscience, p. 42 

Biomedical Data Producers, Stewards, and Funders, p. 54 

Biodiversity, p. 59 

Australian Government Data/Collections, p. 67 

Archaeology, p. 79 

https://librarycarpentry.org/Top-10-FAIR


3 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

About 

The Top 10 FAIR Data & Software Global Sprint was held online over the course of two-days 

(29-30 November 2018), where participants from around the world were invited to develop 

brief guides (stand alone, self paced training materials), called "Things", that can be used by 

the research community to understand FAIR in different contexts but also as starting points 

for conversations around FAIR. The idea for "Top 10 Data Things" stems from initial work 

done at the Australian Research Data Commons or ARDC (formerly known as the Australian 

National Data Service). 

The Global Sprint was organised by Library Carpentry, Australian Research Data Commons 

and the Research Data Alliance Libraries for Research Data Interest Group in collaboration 

with FOSTER Open Science, OpenAire, RDA Europe, Data Management Training 

Clearinghouse, California Digital Library, Dryad, AARNet, Center for Digital Scholarship at 

the Leiden University, and DANS. Anyone could join the Sprint and roughly 25 

groups/individuals participated from The Netherlands, Germany, Australia, United States, 

Hungary, Norway, Italy, and Belgium. See the full list of registered Sprinters. 

Sprinters worked off of a primer that was provided in advance together with an online 

ARDC webinar introducing FAIR and the Sprint titled, "Ready, Set, Go! Join the Top 10 

FAIR Data Things Global Sprint." Groups/individuals developed their Things in Google docs 

which could be accessed and edited by all participants. The Sprinters also used a Zoom 

channel provided by ARDC, for online calls and coordination, and a Gitter channel, provided 

by Library Carpentry, to chat with each other throughout the two-days. In addition, 

participants used the Twitter hashtag #Top10FAIR to communicate with the broader 

community, sometimes including images of the day. 

Participants greeted each other throughout the Sprint and created an overall welcoming 

environment. As the Sprint shifted to different timezones, it was a chance for participants to 

catch up. The Zoom and Gitter channels were a way for many to connect over FAIR but also 

discuss other topics. A number of participants did not know what to expect from a Library 

Carpentry/Carpentries-like event but found a welcoming environment where everyone 

could participate. 

The Top 10 FAIR Data & Software Things repository and website hosts the work of the 

Sprinters and is meant to be an evolving resource. Members of the wider community can 

submit issues and/or pull requests to the Things to help improve them. In addition, a 

published version of the Things will be made available via Zenodo and the Data 

Management Training Clearinghouse in February 2019. 

https://librarycarpentry.org/Top-10-FAIR
https://librarycarpentry.org/blog/2018/10/top-ten-fair-announcement/
/Users/cerdmann/Downloads/topfair/_posts/(https:/www.ands.org.au/working-with-data/skills/23-research-data-things/10-medical-and-health-things)
https://librarycarpentry.org/
https://ardc.edu.au/
https://www.rd-alliance.org/groups/libraries-research-data.html
https://www.fosteropenscience.eu/
https://www.openaire.eu/
https://www.rd-alliance.org/rda-europe
http://dmtclearinghouse.esipfed.org/
http://dmtclearinghouse.esipfed.org/
https://www.cdlib.org/
http://datadryad.org/
https://www.aarnet.edu.au/
https://www.library.universiteitleiden.nl/research-and-publishing/centre-for-digital-scholarship
https://www.library.universiteitleiden.nl/research-and-publishing/centre-for-digital-scholarship
https://dans.knaw.nl/nl
https://docs.google.com/spreadsheets/d/1QQ7Mpxp5ORUE6wheWaC0HXXfiD_G54vVkW1DMMtUM6M/edit?usp=drive_web&ouid=107343664974538994558
https://docs.google.com/document/d/1TwJyButvAVEz5tCq_bdzD6kdKMvy0wiVLuE3uNbR7Bs/edit
https://www.slideshare.net/AustralianNationalDataService/ready-set-go-join-the-top-10-fair-data-things-global-sprint
https://www.slideshare.net/AustralianNationalDataService/ready-set-go-join-the-top-10-fair-data-things-global-sprint
https://monash.zoom.us/j/944903353
https://monash.zoom.us/j/944903353
https://gitter.im/LibraryCarpentry/Top10FAIR
https://twitter.com/search?f=tweets&vertical=default&q=%23Top10FAIR&src=typd
https://github.com/LibraryCarpentry/Top-10-FAIR
https://librarycarpentry.org/Top-10-FAIR/
https://zenodo.org/
http://dmtclearinghouse.esipfed.org/
http://dmtclearinghouse.esipfed.org/


4 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Oceanography 

Sprinters: 

Reid Otsuji, Stephanie Labou, Ryan Johnson, Guilherme Castelao, Bia Villas Boas (UC San 

Diego) 

Table of contents 

Findability: 

Thing 1: Data repositories 

Thing 2: Metadata 

Thing 3: Permanent Identifiers 

Thing 4: Citations 

Accessibility: 

Thing 5: Data formats 

Thing 6: Data Organization and Management 

Thing 7: Re-usable data 

Interoperability: 

Thing 3: Permanent Identifiers 

Thing 6: Data Organization and Management 

Thing 2: Metadata 

Thing 10: APIs and Apps 

Reusability: 

Thing 8: Tools of the trade 

Thing 9: Reproducibility 

Thing 10: APIs and Apps 

Description: 

Oceanographic data encompasses a wide variety of data formats, file sizes, and states of data 

completeness. Data of interest may be available from public repositories, collected on an 

https://librarycarpentry.org/Top-10-FAIR
https://orcid.org/0000-0002-1842-0295
https://orcid.org/0000-0001-5633-5983
https://library.ucsd.edu/about/contact-us/librarians-and-subject-specialists/ryan-johnson.html
https://scripps.ucsd.edu/profiles/gpimentacastelao
http://orcid.org/0000-0001-6767-6556


 5 

individual basis, or some combination of these, and each type has its own set of challenges. 

This “10 Things” guide introduces 10 topics relevant to making oceanographic data FAIR: 

findable, accessible, interoperable, and reusable. 

Audience: 
• Library staff and programmers who provide research support 

• Oceanographers 

• Oceanography data stewards 

• Researchers, scholars and students in Oceanography 

Goal: 

The goal of this lesson is to introduce oceanographers to FAIR data practices in their research 

workflow through 10 guided activities. 

Things 

Thing 1: Data repositories 

There are numerous data repositories for finding oceanographic data. Many of these are from 

official “data centers” and generally have well-organized and well-documented datasets 

available for free and public use. 

• NSF / Earth Cube 

 
• CLIVAR - CCHDO 

 
• CLIVAR - Hawaii ADCP 

 
• CLIVAR - JODC ADCP Data 

 
• NOAA - NODC 

 
• NOAA - NCDC 

 
• NOAA - NGDC 

 
• NSIDC 

 
http://www.nsf.gov/geo/earthcube/
http://cchdo.ucsd.edu/
http://ilikai.soest.hawaii.edu/sadcp/clivar.html
http://www.jodc.go.jp/goin/adcp.html
http://www.nodc.noaa.gov/
http://www.ncdc.noaa.gov/oa/ncdc.html
http://www.ngdc.noaa.gov/
http://nsidc.org/


 6 

• CDIAC 

 
• BCODMO 

 
• GEOTRACES 

 
• R2R 

 
• SAMOS 

 
• ARGO Data 

 
• NASA - PO.DAAC 

 
• World Ocean Database (WOD) 

 
• Spray Underwater Glider 

At some point, you may want or need to deposit your own data into a data repository, so that 

others may find and build upon your work. Many funding agencies now require data 

collected or created with the grant funds to be shared with the broader community. For 

instance, the National Science Foundation (NSF) Division of Ocean Sciences (OCE) mandates 

sharing of data as well as metadata files and any derived data products. Finding the “right” 

repository for your data can be overwhelming, but there are resources available to help pick 

the best location for your data. For instance, OCE has a list of approved repositories in which 

to submit final data products. 

Activity 1: 

• Go to re3data.org and search for a data repository related to your research subject area. 

How many results did you get? Which of these repositories looks most relevant to your 

research area? Is it easy to find a dataset in those repositories that covered the California 

coast (or any other region of your choice) during the last year? 

Activity 2: 

• What is the next journal you would like to publish in? (Alternatively: what is a top 

journal in your field?) Can you find the data submission requirements for this journal? 

Thing 2: Metadata 

High quality metadata (information about the data, such as creator, keywords, units, flags, 

etc.) significantly improves data discovery. While metadata is most often for the data itself, 

http://cdiac.ornl.gov/
http://bcodmo.org/
http://www.geotraces.org/
http://www.rvdata.us/
http://samos.coaps.fsu.edu/html/
http://www.argo.ucsd.edu/Argo_data_and.html
https://podaac.jpl.nasa.gov/
https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html
https://spraydata.ucsd.edu/
https://www.nsf.gov/pubs/2017/nsf17037/nsf17037.jsp
https://www.nsf.gov/pubs/2017/nsf17037/nsf17037.jsp
https://www.nsf.gov/geo/oce/oce-data-sample-repository-list.jsp
https://www.re3data.org/


 7 

metadata can also include information about machines/instruments used, such as make, 

model, and manufacturer, as well as process metadata, which would include details about any 

cleaning/analysis steps and scripts used to create data products. 

Using controlled vocabularies in metadata allows for serendipitous discovery in user 

searches. Additionally, using a metadata schema to mark up a dataset can make your data 

findable to the world. 

Activity 1: 

• Using schema.org markup, look at the metadata elements pertaining to scholarly 

articles: https://schema.org/ScholarlyArticle. Imagine you have an article you have 

hosted on your personal website, and you would like to add markup so that it could be 

more readily indexed by Google Dataset Search. What metadata elements would be 

most important to include? (This resource will help you: 

https://developers.google.com/search/docs/data-types/dataset) 

Activity 2: 

• OpenRefine example for making data FAIR. 

Read this walkthrough of how to “FAIRify” a dataset using the data cleaning tool 

OpenRefine: https://docs.google.com/document/d/1hQ0KBnMpQq93-

HQnVa1AR5v4esk6BRlG6NvnnzJuAPQ/edit#heading=h.v3puannmxh4u 

Discussion: 

• If you had thousands of keywords in a dataset you wanted to associate with a controlled 
vocabulary relevant to your field, what would be your approach? What tools do you 

think would best automate this task? 

Thing 3: Permanent identifiers 

Permanent identifiers (PIDs) are a necessary step for keeping track of data. Web links can 

break, or “rot”, and tracking down data based on a general description can be extremely 

challenging. A permanent identifier like a digital object identifier (DOI) is a unique ID 

assigned to a dataset to ensure that properly managed data does not get lost or misidentified. 

Additionally, a DOI makes it easier to cite and track the impact of datasets, much like cited 

journal articles. 

Identifiers exist for researchers as well: OCRID is essentially a DOI for an individual 

researcher. This ensures that if you have a common name, change your name, change your 

affiliation, or otherwise change your author information, you still get credit for your own 

and maintain a full, identifiable list of your scientific contributions. 

https://schema.org/
https://toolbox.google.com/datasetsearch
https://developers.google.com/search/docs/data-types/dataset
https://docs.google.com/document/d/1hQ0KBnMpQq93-HQnVa1AR5v4esk6BRlG6NvnnzJuAPQ/edit#heading=h.v3puannmxh4u
https://docs.google.com/document/d/1hQ0KBnMpQq93-HQnVa1AR5v4esk6BRlG6NvnnzJuAPQ/edit#heading=h.v3puannmxh4u
https://www.doi.org/
https://orcid.org/


 8 

Activity 1: 

Go to re3data.org and search for a data repository related to your research subject area. From 

the repository you choose, pick a dataset. Does it have a DOI? What is? Who is the creator of 

that dataset? What is the ORCID of the author? 

Activity 2: 

You’ve been given this DOI: 10.6075/J03N21KQ 

• What would you do to find the dataset this DOI references? 

• Using the above approach, you just identified, what is associated with this DOI? Who 

was the creator of this dataset? When was that published? Who funded that research? 

Activity 3: 

• Go to the ORCID website and create an ORCID if you do not have one already. Can you 

identify the creator associated with the DOI on the activity 1? 

Discussion: 

• What would be a positive benefit for having a personal persistent ID such as ORCID? 

Are there any drawbacks or concerns? 

Thing 4: Citations 

Citing data properly is equally as important as citing journal articles and other papers. In 

general, a data citation should include: author/creator, date of publication, title of dataset, 

publisher/organization (for instance, NOAA), and unique identifier (preferably DOI). 

Activity 1: 

• Read through this overview of citing data from DataONE. This has information 

application to any data citations, as well as guidelines specific to DataONE. 

• Think of the last dataset you worked with. Is it something you collected, or was it from 

a public repository? How would you cite this data? 

• Websites/data repositories will often provide the text of preferred citation, but you may 

have to search for it. How would you cite the World Ocean Database? How would you 

cite data from the Multibeam Bathymetry Database? 

Discussion 

Long-term data stewardship is an important factor for keeping data open and accessible for 

the long term. 

https://orcid.org/
https://www.dataone.org/citing-dataone
https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html


 9 

• After completing the last activity, discuss how Open is data in the discipline? Are there 

long-term considerations and protocols for the data that is produced? 

Tip: Resources that can help make your data more open and accessible or to 

protect your data 

• Open Science Framework 

• Figshare 

• Oceanographic data centers 

Thing 5: Data formats 

Oceanographic data can include everything from maps and images to high dimensional 

numeric data. Some data are saved as common, near-universal formats (such as csv files), 

while others require specialized knowledge and software to open properly (e.g., netCDF). 

Explore the Intrinsic characteristics of the dataset that influence the choice of the format, 

such as a time series versus a regular 3-D grid of temperature varying on time; robust ways to 

connect the data with metadata; size factors, binary versus ASCII file; and think about why a 

format to store/archive data is not necessarily the best way to distribute data. 

Discussion 1: 

• what are the most common data formats used in your field? What level of 

technical/domain knowledge is required to open, edit, and interactive with these data 

types? 

Discussion 2: 

• What are the advantages and disadvantages of storing in plain ASCII, like a CSV file 

versus a binary, like netCDF? Does the characteristics of the data influence that 

decision, i.e. the preferred format for a time series would be different than a numerical 

model output, or a gene sequence? 

Thing 6: Data organization and management 

Good data organization is the foundation of your research project. Data often has a longer 

lifespan than the project it is originally associated with and may be reused for follow-up 

projects or by other researchers. Data is critical to solving research questions, but lots of data 

are lost or poorly managed. Poor data organization can directly impact the project or future 

reuse. 

https://osf.io/
https://figshare.com/


 10 

Activity 1: 

Considerations for basic data organization and management 

Group Discussion 1: 

• Is your data file structure something that a new lab member could easily learn, or are 

datasets organized in a more haphazard fashion? 

• Do you have any documentation associated describing how to navigate your data 

structures? 

Group Discussion 2: 

• Talk about where/how you are currently storing data you are working with. Would 

another lab member be able to access all your data if needed? 

Activity 2: 

Identifying vulnerabilities 

• Scenario 1: Your entire office/lab building burns down overnight. No one is harmed, 

because no one was there, but all electronics in the building perish beyond hope of 

repair. The next morning, can you access any of your data? 

• Scenario 2: The cloud server you use (everything from Google Drive to GitHub) crashes. 

Can you still access your most up to date data? 

Discussion 1: 

• From either of the two scenarios, can your data survive a disaster? What are some of the 

things that you think you are doing incorrectly to prevent data loss? 

Discussion 2: 

• Think about a time when you had or potentially had a data disaster - how could the 

disaster have been avoided? What, if anything, have you changed about your data 

storage and workflow as a result? 

The Data Management Plan (DMP) 

Some research institutions and research funders now require a Data Management Plan 

(DMP) for new research projects. Let's talk about the importance of a DMP and what should 

a DMP cover. Think about it you would you be able to create a DMP? 


 11 

What is a DMP? 

A Data Management Plan (DMP) documents how data will be managed, stored and shared 

during and after a research project. Some research funders are now requesting that 

researchers submit a DMP as part of their project proposal. 

Activity 1: 

• Start by watching The DMPTool: A Brief Overview 90 second video to see what the 

DMPTool can do for researhers and data managers. 

• Next, review this short introduction to Data Management Plans. 

• Now browse through some public DMPs from the DMPTool, choose one or two of the 

DMPs related to oceanography and read them to see the type of information they 

capture. 

Activity 2: 

There are many Data Management Plan (DMP) templates in the DMPTool. 

• Choose one DMP funder template you would potentially use for a grant proposal in the 

DMPTool. Spend 5-10 minutes starting to complete the template, based on a research 

project you have been involved with in the past. 

Discussion: 

• You will have noticed that DMPs can be very short, or extremely long and complex. 

What do you think are the two or three pieces of information essential to include in 

every DMP and why? 

• After completing the second activity, what are strengths and weaknesses of your chosen 

template? 

Thing 7: Re-usable data 

There are two aspects to reusability: reusable data, and reusable derived data/process 

products. 

Reusable data 

Reusable data is the result of successful implementation of the other “Things” discussed so 

far. Reusable data (1) has a license which specifies reuse scenarios, (2) is in a domain-suitable 

format and an “open” format when possible, and (3) is associated with extensive metadata 

consistent with community and domain standards. 

https://youtu.be/xT1by-p5jUw
https://www.ands.org.au/working-with-data/data-management/data-management-plans
https://dmptool.org/public_plans


 12 

Process/derived data products 

What is often overlooked in terms of reusability are the products created to automate 

research steps. Whether it’s using the command line, Python, R, or some other programming 

platform, automation scripts in and of themselves are a useful output that can be reused. For 

example, data cleaning scripts can be reapplied to datasets that are continually updated, 

rather than starting from scratch each time. Modeling scripts can be re-used and adapted as 

parameters are updated. Additionally, these research automation products make any data-

related decision you made explicit: if future data users have questions about exclusions, 

aggregations, or derivations, the methodology used is transparent in these products. 

Discussion 1: 

• How many people have made public or shared part of their research automaton 

pipeline? If you haven’t shared this, what prevented you from sharing? 

Discussion 2: 

• Are there instances where your own research would have been improved if you had 

access to other people’s process products? 

Thing 8: Tools of the trade 

When working with your data, there are a selection of proprietary and open source tools 

available to conduct your research analysis. 

Why open source tools? 

Open source tools are software tools developed, in which the source code is openly available 

and published for use and/or modification by any one free of charge. There are many 

advantages to using open source tools: 

• Low software costs 

• Low hardware costs 

• Wide community development support 

• Interoperable with other open source software 

• No vendor control 

• Open source licensing 

Caution: be selective with the tools you use 

There are additional benefits you may hear about using open sources tools which are: 

• Higher quality software 

• Greater security 

• Frequent updates 


 13 

Keep in mind, in an ideal world these three ideas are what we all wish for, however not 

every open source tool satisfies these benefits. When selecting an open source tool, choose a 

package with a large community of users and developers that proves to have long-term 

support. 

Things to consider when using open source tools 

Benefits: 

• Open source tools often have active development community. Quality for end users is 

usually higher because the community are users of the software being developed. In 

turn, open source costs for development are cheaper. 

 
• With a larger community of development, security problems and vulnerabilities are 

discovered and fixed quickly. Another major advantage of open source is the possibility 

to verify exactly which procedures are being applied, avoiding the use of "black-boxes" 

and allowing for a thorough inspection of the methods. 

Issues: 

• Open sources tools are only as good as the community that supports it. Unlike 

commercial software there is no official technical support. Additionally, not all open 

source licenses are permissive. 

• Training time can be significant. 

If Open source tools are not an option and commercial software is necessary for your project, 

there are benefits and issues to consider when using proprietary or commercial software 

tools. 

Benefits: 

• This type of software often comes with official technical support such as a customer 

service phone number or email. 

Issues: 

• Proprietary or commercial tools are often quite expensive at the individual level. 

• Universities may have campus-wide licenses, but if you move institutions, you may find 

yourself without the software you had been using. 

Discussion: 

• Think about the tools you use for conducting data clean up, analysis, and for creating 

visualizations and reports for publications. What were the deciding factors for selecting 

the applications you used or are using for your project? 


 14 

Thing 9: Reproducibility 

Can you or others reproduce your work? 

Reproducibility increases impacts credibility and reuse. 

Read through the following best practices to make your work reproducible. 

Best practices: 

Making your project reproducible from the start of the project is ideal. 

• Documenting each step of your research - from collecting or accessing data, to data 

wrangling and cleaning, to analysis - is the equivalent of creating a roadmap that other 

researchers can follow. Being explicit about your decisions to exclude certain values or 

adjust certain model parameters, and including your rationale for each step, help 

eliminate the guesswork in trying to reproduce your results. 

• Consider open source tools. This allows anyone to reproduce research more easily, and 

helps with identifying who has the right license for the software used. 

This is useful not only for anyone else who wants to test your analysis - often the primary 

beneficiary is you! 

Research often takes months, if not years, to complete a certain project, so by starting with 

reproducibility in mind from the beginning, you can often save yourself time and energy 

later on. 

Discussion: 

Think about a project you have completed or are currently working on. 

• What are some of the best practices you have adopted to make your research 

reproducible for others? 

• Were there any pain points that you encounter or are dealing with now? 

• Is there something you can do about it now? 

• What are the most relevant "Things" previously mentioned in this document that you 

could use to make your research more reproducible? 

Thing 10: APIs and applications (apps) 

APIs (Application Programming Interfaces) allow programmatic access to many databases 

and tools. They can directly access or query existing data, without the need to download 

entire datasets, which can be very large. 


 15 

Certain software platforms, such as R and Python, often have packages available to facilitate 

access to large, frequently used database APIs. For instance, the R package “rnoaa” can access 

and import various NOAA data sources directly from the R console. You can think of it as 

using an API from the comfort of a tool you’re already familiar with. This not only saves 

time and computer memory, but also ensures that as databases are updated, so are your 

results: re-running your code automatically pulls in new data (unless you have specified a 

more restricted date range). 

Activity: 

On the ERDDAP server for Spray Underwater Glider data, select temperature data for the 

line 90 (https://spraydata.ucsd.edu/erddap/tabledap/binnedCUGN90.html). 

• Restrict it to measurements at 25 m or shallower. 

• Choose the format of your preference, and instead of submit the request, generate an 

URL. 

• Copy and paste the generated URL in your browser. 

Discussion: 

• Think about the last online data source you accessed. Is there an API for this data 

source? Is there a way to access this data from within your preferred analysis software? 

 
https://ropensci.org/tutorials/rnoaa_tutorial/
https://spraydata.ucsd.edu/erddap/tabledap/binnedCUGN90.html
https://spraydata.ucsd.edu/erddap/tabledap/binnedCUGN90.html


 16 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Research Software 
 

Sprinters 

Anna-Lena Lamprecht, Carlos Martinez Ortiz, Chris Erdmann, Leyla Garcia, Mateusz Kuzak, 

Paula Andrea Martinez 

Description: 

The FAIR data principles are widely known and applied today. What the FAIR principles 

mean for (scientific) software is an ongoing discussion. However, there are some things on 

which there is already agreement that they will make software (more) FAIR. In this 

document, we go for some ‘low hanging fruit’ and describe 10 easy FAIR software things that 

you can do. To limit the scope, “software” here refers to scripts and packages in languages 

like R and Python, but not to other kinds of software frequently used in research, such as 

web-services, web platforms like myexperiment.org or big clinical software suites like 

OpenClinica. 

A poster summarizing these 10 FAIR software things is also available. 

Audience: 
• Researchers who develop software 

• Research Software Engineers 

Goals: 

Translate FAIR principles to applicable actions for scientific software 

What is FAIR for software 

In the context of this document, we use the following simple definition of FAIR for software: 

Findable 

Software with sufficiently rich metadata and unique persistent identifier 

Accessible 

Software metadata is in machine and human readable format. Software and metadata is 

deposited in trusted community approved repository. 

https://librarycarpentry.org/Top-10-FAIR
https://www.uu.nl/staff/ALLamprecht
https://github.com/c-martinez
https://github.com/libcce
https://github.com/ljgarcia
https://github.com/mkuzak
https://github.com/orchid00/
https://www.go-fair.org/fair-principles/
file://///Top-10-FAIR/files/poster_10things_FAIRsoftware.pdf
https://researchsoftware.org/


 17 

Interoperable 

Software uses community accepted standards and platforms, making it possible for users to 

run the software. 

Reusable 

Software has clear licence and documentation 

Things 

Findability 

Thing 1: Create a description of your software 

The name alone does not tell people much about your software. In order for other people to 

find out if they can use it for their purpose, they need to know what it does. A good 

description of your software will also help other people to find it. 

Activity: 

Think of minimum set of information (metadata) which will help others find your software. 

This can include short descriptive text and meaningful keywords. 

Codemeta is a set of keywords used to describe software and way to structure them in 

machine readable way. For examples of Codemeta used in software packages see: 

• https://github.com/NLeSC/boatswain/blob/master/codemeta.json 

• https://github.com/datacite/maremma 

Edam is an example of an ontology that provides terminology that can be used to describe 

bioinformatics software. 

Take the 4OSS lesson episode about metadata and registries and walk through the exercise. 

This example: http://r-pkgs.had.co.nz/description.html#description 

Thing 2: Register your software in a software registry 

People search for research software using search engines like Google. Registering your 

software in a dedicated registry will make it findable by search engines, because the registries 

take care about search engine optimization etc. The registries will usually ask you to provide 

descriptions (metadata) as above. 

Activity: 

Think of the registries most used in your domain? Do you know about any? How and where 

do you usually find software? What kind of keywords do you use when searching? 

https://codemeta.github.io/terms/
https://github.com/NLeSC/boatswain/blob/master/codemeta.json
https://github.com/datacite/maremma
http://edamontology.org/page
https://softdev4research.github.io/4OSS-lesson/05-use-registry/index.html
http://r-pkgs.had.co.nz/description.html#description


 18 

Here are some examples of research software registries: 

* bio.tools * Research Software Directory (check if your institution hosts one) * rOpenSci 

Project * Zenodo 

4OSS lesson episode about metadata and registries 

Thing 3: Get and use a unique and persistent identifier for your software 

It will help others find and access the particular version of your software. Unique means that 

the identifier will point on and only version and location of your software. Persistent means 

that it will pointing to the same version and location for long, specified amount of time. For 

example, Zenodo provides you with a DOI (Digital Object Identifier) that will be resolvable 

for at least the next 20 years. Recent initiatives, such as Software Heritage, propose to 

associate a permalinks as intrinsic SHA1 identifier to software (see example through the id: 

swh:1:dir:005bc6218c7a0a9ede654f9a177058adcde98a50 / permalinks: 

https://archive.softwareheritage.org/swh:1:dir:005bc6218c7a0a9ede654f9a177058adcde98a50

/) 

Activity: 

If you have registered your software in a registry, chances are good that they provide a 

unique and persistent identifier. If not, obtain an identifier from another organization. If you 

have multiple identifiers, choose one that you use as your main identifier. Make sure you use 

it consistently when referring to your software, e.g. on your own website, code repository or 

in publications. 

Making your code citable with Zenodo 

Accessibility 

Thing 4: Make sure that people can download your software 

In order for anyone to use your software, they need to be able to download an executable 

version along with documentation. For interpreted languages like Python and R, the code is 

also the executable version. For compiled languages like Java and C, the executable version is 

a binary file, and the code might not be accessible. Downloading the software and 

documentation is possible, for instance, from a project website, a git repository or from a 

software registry. 

Activity: 

Using the identifier as your starting point, ask a colleague to try to get your software 

(binary/script). Can he/she download it? Does he/she also have access to the documentation? 

Is there anything preventing him/her from getting to it? Is it hosted on a reliable platform 

(long term persistent, such as Zenodo, PyPI, CRAN)? 

https://bio.tools/
https://github.com/research-software-directory/research-software-directory
https://ropensci.github.io/
https://ropensci.github.io/
https://zenodo.org/
https://softdev4research.github.io/4OSS-lesson/05-use-registry/index.html
https://archive.softwareheritage.org/swh:1:dir:005bc6218c7a0a9ede654f9a177058adcde98a50/
https://archive.softwareheritage.org/swh:1:dir:005bc6218c7a0a9ede654f9a177058adcde98a50/
https://guides.github.com/activities/citable-code/


 19 

Interoperability 

Thing 5: Explain the functionality of your software 

Your software performs one or more operations that take an input and transform it into the 

output. To help people use your software, provide a clear and concise description of the 

operations along with the corresponding input and output data types. For example, the wc 

(word count) command line tool takes a text as input, counts the number of words in it and 

gives the number of words as output. The ClustalW tool takes a set of (gene or protein) 

sequences as input, aligns them and returns a multiple sequence alignment as output. 

Activity: 

List all operations that your software provides, and describe them along with corresponding 

input and output data types. If possible, use terms from a domain ontology like EDAM. 

Thing 6: Use standard (community agreed) formats for inputs and outputs 

In order for people to use your software, they need to know how to feed data to it -- standard 

formats are easy ways to exchange data between different pieces of software. By sticking to 

standards, it is possible to use the output from another piece of software as an input to your 

software (or the other way around). For example, FASTA is a format for representing 

molecular sequences (DNA, RNA, protein, …) that most sequence analysis tools can handle. 

NetCDF is a standard file format used sharing of array-oriented scientific data. 

Activity: 

What are the relevant standards in your field? Which are the groups/organizations that are 

responsible for standards in your field? Is there a place where you can find the relevant 

standards and a detailed description? What other tools use these standards? If possible, use 

such standard formats as input/output of your software and state which you are using. (Avoid 

to define your own standards! http://imgs.xkcd.com/comics/standards.png) 

Reusability 

Thing 7: Document your software 

Your software should include sufficient documentation: instructions on how to install, run 

and use your software. All dependencies of your software should be clearly stated. Provide 

sufficient examples on how to execute the different operations your software offers, ideally 

along with example data. Write the Docs page explains and gives examples of good 

documentation. 

http://man7.org/linux/man-pages/man1/wc.1.html
https://www.genome.jp/tools-bin/clustalw
http://imgs.xkcd.com/comics/standards.png
https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/


 20 

Activity: 

Ask a colleague to look at your software’s documentation. Is he/she able to install your 

software? Can he/she run it? Can he/she produce the expected results? 

Thing 8: Give your software a license 

A license tells your (potential) users what they are allowed to do with your software (and 

what not to do), and can protect your intellectual property. Without a license people may 

spend time trying to figure out if they are allowed to use your software -- make things easy 

for them. Therefore, it is important that you choose a software license that meets your 

intentions. Choose a license website provides a simple guide for picking the right license for 

your software. 

Activity: 

* Follow the 4OSS lesson to learn more about licenses and their implications. * Read 4OSS 

paper 

Thing 9: State how to cite your software 

You want to get credit for your work. By providing the citation guideline you will help users 

of your software to cite your work properly. There is no single right way to do it. Software 

Sustainability Institute website provides more information and discussion on this topic in a 

blog post How to cite and describe software. 

Activity: 

Read “Software citation principles” paper. Read documentation of Citation File Format and 

create CFF file for your software. 

Thing 10: Follow best practices for software development 

Reusability benefits from good quality of software. There are a number of actions you can 

take to improve the quality of your software: make your code modular, have code level 

documentation, provide tests, follow code standards, use version control, etc. There are 

several guidelines which you can use to guide you in the process such as the eScience Center 

Guide, the best practices and the good enough practices. 

Activity: 

Familiarize yourself with the guides provided above. Have a look at your software and create 

a list of actions which you could follow to improve the quality of your software. Ideally, 

follow these practices from the very beginning. 

 
https://choosealicense.com/
https://softdev4research.github.io/4OSS-lesson/03-use-license/index.html
https://f1000research.com/articles/6-876/v1
https://f1000research.com/articles/6-876/v1
https://www.software.ac.uk/how-cite-software
https://www.force11.org/software-citation-principles
https://citation-file-format.github.io/
https://citation-file-format.github.io/cff-initializer-javascript/
https://guide.esciencecenter.nl/
https://guide.esciencecenter.nl/
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510


 21 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Research Libraries 
 

Sprinters: 

Liz Stokes, Natasha Simons, Tom Honeyman (Australian Research Data Commons), Chris 

Erdmann,(Library Carpentry/The Carpentries/California Digital Library), Sharyn Wise ( 

University of Technology, Sydney), Josh Quan, Scott Peterson, Amy Neeser (UC Berkeley) 

Description: 

To translate FAIR principles into useable concepts for research-facing support staff (e.g. 

librarians). 

Audience: 
• Library staff who provide research support 

• Those who want to know more about FAIR and how it could be applied to libraries 

Goals: 
• Translating FAIR speak to library speak (What is it? Why do I need to know? What do I 

tell researchers?) 

• Identifying ways to improve the ‘FAIRness’ of your library 

• Understanding that FAIR data helps us be better stewards of our own resources 

Things 

Thing 1: Why should librarians care about FAIR? 

There’s a lot of hype about the FAIR Data Principles. But why should librarians care? For 

starters, libraries have a strong tradition in describing resources, providing access and 

building collections, and providing support for the long-term stewardship of digital 

resources. Building on their specific knowledge and expertise, librarians should feel 

confident with making research data FAIR. So how can you and your library get started with 

the FAIR principles? 

Activity: 

1. Read LIBER’s Implementing FAIR Principles: the role of Libraries at 
https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf (5 minute read) 

https://librarycarpentry.org/Top-10-FAIR
https://twitter.com/ragamouf
https://twitter.com/n_simons
https://www.linkedin.com/in/tom-honeyman-478277151/
https://twitter.com/libcce
https://twitter.com/libcce
https://orcid.org/0000-0001-7677-780X
https://github.com/wrathofquan
https://github.com/scottcpeterson
https://twitter.com/pseudoamyloid
https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf


 22 

Consider: 

* Where is your library at in regard to the section on ‘getting started with FAIR’? * Where are 

you at in your own understanding of the FAIR Data Principles? 

Thing 2: How FAIR are your data? 

The FAIR Principles are easily understood in theory but more challenging when applied in 

practice. In this exercise, you will be using the Australian Research Data Commons (ARDC) 

Data self-assessment tool to assess the 'FAIRness' of one of your library’s datasets. 

Activity: 

1. Select a metadata record from your library’s collection (e.g. your institutional repository) 

that describes a published dataset. 2. Open the ARDC FAIR Data Assessment tool and run 

your chosen dataset against the tool to assess its ‘FAIRness’. 

Consider: 

* How FAIR was your chosen dataset? * How easy was it to apply the FAIR criteria to your 

dataset? * What things need to happen in order to improve the ‘FAIRness’ of your chosen 

dataset? 

Want more? 

Try your hand at other tools like the CSIRO 5 star data rating tool and the DANS FAIR data 

assessment tool. 

Thing 3: Do you teach FAIR to your researchers? 

How FAIR aware are your researchers? Does your library incorporate FAIR into researcher 

training? 

Activity: 

Go to existing data management/data sharing training you provide to Graduates, Higher 

Degree Researchers (HDRs) or other researchers. For example, review the Duke Graduate 

School’s Responsible Conduct of Research topics page. Review how well the 15 FAIR 

Principles are covered in this training and adjust accordingly. 

Thing 4: Is FAIR built into library practice and policy? 

Your library may do a great job advocating the FAIR Data Principles to researchers but how 

well have the Principles been incorporated into library practice and policy? 

Activity: 

1. Review your library or institutional policies regarding research data management and 

digital preservation with the FAIR Principles in mind. Consider that in most cases library 

policy will have been written before the advent of FAIR. Are revisions required? 2. Review 

https://www.ands-nectar-rds.org.au/fair-tool
https://doi.org/10.4225/08/5a12348f8567b
https://www.surveymonkey.com/r/fairdat
https://www.surveymonkey.com/r/fairdat
https://gradschool.duke.edu/professional-development/programs/responsible-conduct-research/rcr-topics
https://gradschool.duke.edu/professional-development/programs/responsible-conduct-research/rcr-topics
https://www.force11.org/group/fairgroup/fairprinciples
https://www.force11.org/group/fairgroup/fairprinciples


 23 

the data repository managed by your library. How well does it support FAIR Data? 3. Review 

your library’s Data Management Planning tool. Does it have features that support the FAIR 

Data Principles or are changes required? 

Thing 5: Are your library staff trained in FAIR? 

Reusing the wide range of openly available training materials available in the FAIR Data 

Principles e.g. you could start here. 

Activity: 

* Conduct a skills and knowledge audit regarding FAIR with your library team. * Based on 

the audit, identify gaps in FAIR skills and knowledge. * Design a training program that can 

fill the identified gaps. To help build your program, read the blog post, A Carpentries based 

approach to teaching FAIR data and software principles. 

Consider: 

Reusing the wide range of openly available training materials available in the FAIR Data 

Principles e.g. you could start here. 

Thing 6: Are digital libraries FAIR? 

While the FAIR Principles are designed for data, considering their potential application in a 

broader context is useful. For example, think about what criteria might be applied to assess 

the ‘FAIRness’ of digital libraries. Considerations might include: 

* Persistent identifiers * Open access vs. paid access * Provenance information / metadata * 

Author credibility * Versioning information 

* License / reuse information * Usage statistics (number of times downloaded) 

Activity: 

1. Select one of these digital libraries (or another of your choice): * British Library * National 

Digital Library of India * Europeana * National Library of Australia’s Trove 2. Search/browse 

the catalogue of items. 

Consider: 

* Does the library display reuse permissions/licenses on how to use the item? * Is there 

provenance information? * Are persistent identifiers used? 

Thing 7: Does your library support FAIR metadata? 

A number of FAIR principles make reference to “metadata”. What is metadata, how is it 

relevant to FAIR and does your library support the kind of metadata specified in the FAIR 

Data Principles? 

https://www.ands.org.au/working-with-data/fairdata/training
https://uc3.cdlib.org/2018/07/24/a-carpentries-based-approach-to-teaching-fair-data-and-software-principles/
https://uc3.cdlib.org/2018/07/24/a-carpentries-based-approach-to-teaching-fair-data-and-software-principles/
https://www.ands.org.au/working-with-data/fairdata/training
https://www.bl.uk/
https://ndl.iitkgp.ac.in/
https://ndl.iitkgp.ac.in/
https://www.europeana.eu/
https://trove.nla.gov.au/


 24 

Activity: 

1. Watch this video in which the Metadata Librarian explains metadata (3 mins) 2. Select 

three metadata records at random for datasets held in your library or repository collection. 3. 

Open the checklist produced for use at the EUDAT summer school and see if you can check 

off those that reference metadata against the records you selected. 4. Make a list of what 

metadata elements could be improved in your library records to enable better support for 

FAIR. 

Thing 8: Does your library support FAIR identifiers? 

The FAIR data principles call for open, standardised protocols for accessing data via a 

persistent identifier. Persistent identifiers are crucial for the findability and identification of 

research, researchers and for tracking impact metrics. So how well does your library support 

persistent identifiers? 

Activity: 

Find out how well your library supports ORCIDs and DOIs: * Do your library systems 

support the identification of researchers via an ORCID? Do you authenticate against the 

ORCID registry? Do you have an ORCID? * Do your library systems, such as your 

institutional repository, support the issuing of Digital Object Identifiers (DOIs) for research 

data and related materials? 

Consider: 

* What other types of persistent identifiers do you think your library should support? Why 

or why not? 

Want more? 

If you library supports the minting of DOIs for research data and related materials, is there 

more that you could do in this regard? Check out A Data Citation Roadmap for Scholarly 

Repositories and determine how much of the roadmap you can check off your list and how 

much is yet to do. 

Thing 9: Does your library support FAIR protocols? 

For (meta)data to be accessible it should ideally be available via a standard protocol. Think of 

protocols in terms of borrowing a book: there are a number of expectations that the library 

lays out in order to proceed. You have to identify yourself using a library card, you have to 

bring the book to the checkout desk, and in return you walk out of the library with a 

demagnetised book and receipt reminding you when you have to return the book by. 

Accessing the books in the library means that you must learn and abide by the rules for 

accessing books. 

https://www.youtube.com/watch?v=ABF2FvSPVYE
http://doi.org/10.5281/zenodo.1065991
https://orcid.org/
http://www.doi.org/
https://doi.org/10.1101/097196
https://doi.org/10.1101/097196


 25 

Activity: 

* Familiarise yourself with APIs by completing Thing 19 of the ANDS 23 (research data) 

Things * Consider the APIs your library provides to enable access (meta)data for data and 

related materials. Are they up to scratch or are improvements required? 

Thing 10: Next steps for your library in supporting FAIR 

In Thing 1 you read LIBER’s Implementing FAIR Principles: the role of Libraries. You 

considered what your library needed to do in order to better support FAIR data. In Thing 10 

we will create a list of outstanding action items. 

Activity: 

1. Write a list of what your library is currently doing to support and promote the FAIR Data 

Principles. 2. Now compare this to the list in the LIBER document. Where are the gaps and 

what can you do to fill these? 3. Create an action plan to improve FAIR support at your 

library! 

Consider: 

* Incorporate all that you learnt and progress that you made in “doing” this Top 10 FAIR 

Things! 

 
https://www.ands.org.au/working-with-data/skills/23-research-data-things/all23/thing-19


 26 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Research Data Management Support 
 

Sprinters: 

Lena Karvovskaya, Otto Lange, Iza Witkowska, Jacques Flores (Research Data Management 

(RDM) support at Utrecht University) 

Description: 

This is an umbrella-like document with links to various resources. The aim of the document 

is to help researchers who want to share their data in a sustainable way. However, we 

consider the border between librarians and researchers to be a blurred one. This is because, 

ultimately, librarians support researchers that would like to share their data. We primarily 

wish to target researchers and support staff irregardless of their experience: those who have 

limited technical knowledge and want to achieve a very general understanding of the FAIR 

principles and those who are more advanced technically and want to make use of more 

technical resources. The resources we will link to for each of the 10 FAIR Things will often 

be on two levels of technicality. 

Audience: 

Our primary audience consists of researchers and support staff at Utrecht University. 

Therefore, whenever possible we will use the resources available at Utrecht University: the 

institutional repositories and resources provided at the RDM Support website. 

Things 

Thing 1: Why bother with FAIR? 

Background: The advancement of science thrives on the timely sharing and accessibility of 

research data. Timely and sustainable sharing is only possible if there are infrastructures and 

services that enable it. 

1. Read up on the role of libraries in implementing the FAIR Data Principles. Think about 

the advantages and opportunities made possible by digitalization in your research area. 

Think about the challenges. Have you or your colleagues ever experienced data loss? Is 

the falsification/fabrication of data an issue with digital data? How easy it to figure out if 

the data you found online is reliable? Say you found a very useful resource available 

https://librarycarpentry.org/Top-10-FAIR
https://www.uu.nl/medewerkers/EKarvovskaya
https://www.uu.nl/staff/OALange
https://www.uu.nl/staff/IMWitkowska
https://www.uu.nl/staff/JPFlores
https://www.uu.nl/en/research/research-data-management
https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf


 27 

online and you want to refer to it in your work; can you be sure that it is still there 

several years later? 

2. For more information, you can refer to this detailed explanation of FAIR principles 

developed by the Dutch Center for Life Sciences (DTLS). 

Thing 2: Metadata 

Background: Metadata are information about data. This information allows data to be 

findable and potentially discoverable by machines. Metadata can describe the researchers 

responsible for the data, when, where and why the data was collected, how the research data 

should be cited, etc. 

1. If you find the discussion on metadata too abstract, think about a traditional library 

catalogue record as a form of metadata. A library catalogue card holds information about 

a particular book in a library, such as author, title, subject, etc. Library cataloging, as a 

form of metadata, helps people find books within the library. It provides information 

about books that can be used in various contexts. 

Now, reflect on the differences in functionality between a paper catalogue card and a digital 

metadata file. 

1. Reflect on your own research data. If someone who is unfamiliar with your research 

wants to find, evaluate, understand and reuse your data, what would he/she need? 

2. Watch this video about structural and descriptive metadata and reflect on the example 

provided in the video. If the video peaked your interest about metadata, watch a similar 

video on the Ins and outs of metadata and data documentation by Utrecht University. 

Thing 3: The definition of FAIR metrics 

Background: FAIR stands for Findable, Accessible, Interoperable and Re-usable. 

https://www.go-fair.org/fair-principles/
https://www.youtube.com/watch?v=L0vOg18ncWE&feature=youtu.be
https://www.youtube.com/watch?v=h0oZ3swbTJ0&
https://www.youtube.com/watch?v=h0oZ3swbTJ0&


 28 

 
1. Take a look at the image above, provided by the Australian Research Data Commons 

(ARDC). Reflect on the images chosen for various aspects of the FAIR acronym. If we 

consider this video, already mentioned in Thing 2, how would you describe the 

photography example in terms of FAIR? 

2. Go to DataCite and choose data center "Utrecht University". Select one of the published 

datasets and evaluate it with respect to FAIR metrics. In evaluating the dataset, you can 

make use of the FAIR Data self-assessment tool created by ARDC. Which difficulties do 

you experience while trying to do the evaluation? 

Thing 4: Searchable resources and repositories 

Background: To make objects findable we have to commit ourselves to at least two major 

points: 1) these objects have to be identifiable at a fixed place, and 2) this place should be 

fairly visible. When it comes to finding data this is where the role of repositories comes in. 

1. Utrecht University has its own repository YODA, short for "YOur DAta". It is possible 

to publish a dataset in this repository so that it becomes accessible online. Try to search 

for one of the datasets listed on YODA in Google Data Search. Take "ChronicalItaly" as 

an example. Was it difficult to find the dataset? Now try to search for one of the 

databases stored at the Meertens Institute using Google Dataset search. Why are the 

results so different? 

https://www.youtube.com/watch?v=L0vOg18ncWE&feature=youtu.be
https://search.datacite.org/
https://www.ands-nectar-rds.org.au/fair-tool
https://yoda.sites.uu.nl/
https://public.yoda.uu.nl/i-lab/UU01/T4YMOW.html
https://www.meertens.knaw.nl/cms/en/collections/databases


 29 

2. Take a look at the storage solutions suggested by Utrecht RDM Support. Identify 

searchable repositories among these solutions. 

Thing 5: Persistent identifiers 

Background: A persistent identifier is a permanent and unique referral to an online digital 

object, independent of (a change in) the actual location. An identifier should have an 

unlimited lifetime, even if the existence of the identified entity ceases. This aspect of an 

identifier is called "persistency". 

1. Read about the Digital Object Identifier (DOI)) System for Research Data provided by 

the Australian National Data Service (ANDS). 

 
2. Watch the video "Persistent identifiers and data citation explained" by Research Data 

Netherlands. Read about persistent identifiers on a very general level (awareness). 

Thing 6: Documentation 
1. Browse through the general overview of data documentation as provided by the 

Consortium of European Social Science Data Archives. Think of the principal 

differences between object-level documentation of quantitative and qualitative data. 

Thing 7: Formats and standards 
1. Take a look at data formats recommended by DANS. Which of these formats are 

relevant for your subject area and for your data. Do you use any of the non-preferred 

formats? Why? 

2. Read the background information about file formats and data conversion provided by 

the Consortium of European Social Science Data Archives. Reflect on the difference 

between short-term and long-term oriented formats. Think of a particular example of 

changing from a short-term processing format to a long-term preservation format, 

relevant for your field. 

Thing 8: Controlled vocabulary 

Background: The use of shared terminologies strengthens communities and increases the 

exchange of knowledge. When the researchers refer to specific terms, they rely on common 

understanding of these terms within the relevant community. Controlled vocabularies are 

concerned with the commitment to the terms and management standards that people use. 

1. Browse Controlling your Language: a Directory of Metadata Vocabularies from JISC in 

the UK. Reflect on possible issues that may arise if there is no agreement on the use of a 

controlled vocabulary within a research group. 

https://www.uu.nl/en/research/research-data-management/tools-services/tools-for-storing-and-managing-data/storage-solutions
https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-Identifiers.pdf
https://www.youtube.com/watch?v=PgqtiY7oZ6k
https://www.ands.org.au/guides/persistent-identifiers-awareness
https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/2.-Organise-Document/Documentation-and-metadata
https://dans.knaw.nl/en/deposit/information-about-depositing-data/before-depositing/file-formats
https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/3.-Process/File-formats-and-data-conversion
https://www.cessda.eu/
https://www.webarchive.org.uk/wayback/archive/20160101151732/http:/www.jiscdigitalmedia.ac.uk/guide/controlling-your-language-links-to-metadata-vocabularies


 30 

2. Consider the following example from earth science research: "to be able to adequately 
act in the case of major natural disasters such as earthquakes or tsunamis, scientists need 
to have knowledge of the causes of complex processes that occur in the earth's crust. To 
gain necessary insights, data from different research fields are combined. This is only 
possible if researchers from different applicable sub-disciplines 'speak the same 
language'". Choose a topic within your research interests that requires combining data 
from different sub-disciplines. Think about some differences in vocabularies between 

these sub-disciplines. 

Thing 9: Use a license 

Background: A license states what a user is allowed to do with your data and creates clarity 

and certainty for potential users. 

1. Take a look at various Creative Commons licences. Which licenses put the least 

restrictions on data? You can make use of Creative Commons guide to figure this out. 

2. Watch this video about Creative Commons licences. 

Thing 10: FAIR and privacy 

Background: The General Data Protection Regulation (GDPR) and its implementation in the 

Netherlands called Algemene Verordening Gegevensbescherming(AVG) requires parties 

handling data to provide clarity and transparency where personal data are concerned. 

1. Take a look at at the Handling personal data guide from the Utrecht University RDM 

website. Reflect on how personal data can be FAIR. 

 
https://www.uu.nl/en/research/research-data-management/tools-services/designing-metadata-schemes
https://creativecommons.org/licenses/
http://creativecommons.org/choose/
https://www.youtube.com/watch?v=HyWdeNQ7fo0
https://gdpr-info.eu/
https://autoriteitpersoonsgegevens.nl/nl/onderwerpen/avg-nieuwe-europese-privacywetgeving/algemene-informatie-avg
https://www.uu.nl/en/research/research-data-management/guides/handling-personal-data


 31 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

International Relations 
 

Sprinter: 

Fiona Bradley, UNSW Library, and University of Western Australia (PhD Candidate) 

Description: 

International Relations researchers increasingly make use of and create their own datasets in 

the course of research, or as part of broader research projects. The funding landscape in the 

discipline is mixed, with some receiving significant grants subject to Open Access and Open 

Data compliance while others are not funded for specific outputs. Datasets have many 

sources, they may be derived from academic research, or increasingly, make use of large-N 

datasets produced by polling organisations such as YouGov, Gallup, third-party datasets 

produced by non-governmental organisations or NGOs that undertake human rights 

monitoring, or official government data. There is a wide range of licensing arrangements in 

place, and many different places to store this data. 

What is FAIR data? 

FAIR data is findable, accessible, interoperable and reusable. For more information, take a 

look at the Force 11 definition. 

Audience: 

International relations and human rights researchers 

Goal: 

Help researchers understand FAIR principles 

Things 

Thing 1: Getting started 

Is there a difference between open and FAIR data? Find out more: https://www.go-

fair.org/faq/ask-question-difference-fair-data-open-data/ 

https://librarycarpentry.org/Top-10-FAIR
http://orcid.org/0000-0002-3622-2794
https://www.force11.org/group/fairgroup/fairprinciples
https://www.go-fair.org/faq/ask-question-difference-fair-data-open-data/
https://www.go-fair.org/faq/ask-question-difference-fair-data-open-data/


 32 

ACTIVITY: 

Are there examples in your own research where you have used or created data that may be 

FAIR, but may not necessarily be open? * Does the material you used or created include 

personal information? * Does it include culturally sensitive materials? * Does it reveal 

information that endangers or reveals the location of human rights defenders, 

whistleblowers, or other people requiring protection? * Does it involve material subject to 

commercial agreements? 

Thing 2: Discovering data 

United Nations (UN) agencies, international organisations, governments, NGOs, and 

researchers all produce and share data. Some data are very easy to use - they are well-

described, and a comprehensive code book may be supplied. Other data may need significant 

clean up especially if definitions or country borders have changed over time, as they will in 

longitudinal datasets. A selection of the types of datasets available are linked below: 

• Polity IV dataset 

• World Bank Open Data 

• ITU Global IT statistics 

• Freedom House reports 

• American Journal of Political Science (AJPS) Dataverse 

• UK dataset guidelines (provides advice on using many open datasets) 

• ICPSR: Inter-university Consortium for Political and Social Research 

Thing 3: Data identifiers 

A unique, permanent link helps make it easy to identify and find data. A Digital Object 

Identifier (DOI) is a widely used identifier, but not the only one available. If you are 

contributing a dataset to an institutional repository or discipline repository, these services 

may ‘mint’ a DOI for you to attach to your dataset. 

Zenodo is an example of an open repository that will provide a DOI for your dataset. The 

AJPS Dataverse and UK Data Service, linked in Thing 2, both use DOIs to identify datasets. 

Thing 4: Data citation 

Using someone else’s dataset? Or want to make sure you are credited for use of data? The 

Make Data Count initiative and DataCite are developing guidelines to ensure that data 

citations are measured and credited to authors, in the same way as other research outputs. 

Currently many researchers, NGOs, and organisations contribute data to the UN system or at 

national level to show progress on the UN 2030 Agenda for Sustainable Development, 

including the Sustainable Development goals. There are several initiatives aimed at 

http://www.systemicpeace.org/polityproject.html
https://data.worldbank.org/
https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx
https://freedomhouse.org/reports
https://dataverse.harvard.edu/dataverse/ajps
https://www.ukdataservice.ac.uk/use-data/guides/dataset-guides
https://www.icpsr.umich.edu/icpsrweb/
https://zenodo.org/
https://makedatacount.org/
https://www.datacite.org/
https://sustainabledevelopment.un.org/
http://www.data4sdgs.org/


 33 

strengthening national data including national statistical office capacity, disaggregated data, 

third-party data sources, and scientific evidence. 

Thing 5: Data licensing 

Depending on your funder, publisher, or purpose of your dataset, you may have a range of 

data licensing compliance requirements, or options. Creative Commons is one licensing 

option. The Australian Research Data Commons (formerly known as the Australian National 

Data Service) provides a guide with workflows for understanding how to licence your data in 

Australia. 

ACTIVITY: 

When might a Creative Commons licence not be appropriate for your data? For example: * 

When you are working on a contract and the contracting body does not permit it? * When 

you are producing data for a body with a more permissive licence or different licencing 

scheme in place? * When you are producing data on behalf of a body with an Open 

Government Data licence? (Linked example is for UK) * Are there other examples? 

Thing 6: Sensitive data 

Human rights researchers, scholars studying regime change in fragile and conflict states, and 

interviews with security officials are among the cases where data may need to be handled 

carefully, and be sensitive. In these cases, procedures utilised in collecting the data must 

remain secure, and the data may be FAIR, but not open, or require specific access protocols 

and mediated access. See: 

• A human-rights based approach to data, UN OCHR 

• Data security, UK Data Service 

Thing 7: Data publishing 

Data sharing policies in political science and international relations journals vary widely. See: 

• Data policies of highly-ranked social science journals 

ACTIVITY: 

* What might some general data requirements look like for international relations? Are the 

Data Access, Production Transparency, and Analytic Transparency guidelines for APSR 

(American Political Science Review) helpful? * Or do you prefer a less defined set of criteria, 

such as that set out by International Organization? 

https://wiki.creativecommons.org/wiki/Data_and_CC_licenses
https://www.ands.org.au/guides/research-data-rights-management
https://datacatalog.worldbank.org/public-licenses
http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
https://www.ohchr.org/Documents/Issues/HRIndicators/GuidanceNoteonApproachtoData.pdf
https://www.ukdataservice.ac.uk/manage-data/store/security
https://osf.io/preprints/socarxiv/9h7ay
https://www.apsanet.org/APSR-Submission-Guidelines
http://iojournal.org/data-archive/


 34 

Thing 8: Funder requirements 

Funder requirements vary. Gary King has compiled the policies of most major social science 

funders (and journals, see Thing 7). 

Thing 9: Data sharing 

Your funder or publisher may set requirements for data sharing, either as ‘supplementary 

data’, or in a data repository. But, what if you aren’t funded, and aren’t required to provide 

supplementary data or comply with data publishing conditions? Make it a habit and practice 

to prepare and release your datasets as FAIR data when appropriate. Choose a repository, 

claim an identifier (Thing 3), and licence it appropriately (Thing 5). Add links to your 

homepage and ORCID profile. See: 

• Guide to Social Science Data Preparation and Archiving 

Thing 10: Learn more 

The Carpentries provide training and workshops on fundamental data skills for research. 

  
https://gking.harvard.edu/pages/data-sharing-and-replication
https://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
https://carpentries.org/


 35 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Humanities: Historical Research 
 

Sprinters: 

Kristina Hettne, Peter Verhaar (Centre for Digital Scholarship at Leiden University), Ben 

Companjen, Laurents Sesink, Fieke Schoots (Centre for Digital Scholarship at Leiden 

University, reviewer), Erik Schultes (GO FAIR, reviewer), Rajaram Kaliyaperumal (Leiden 

Universitair Medisch Centrum, reviewer), Erzsebet Toth-Czifra (DARIAH, reviewer), 

Ricardo de Miranda Azevedo (Maastricht University, reviewer), Sanne Muurling (Leiden 

University Library, reviewer). 

Description: 

This document offers a concise overview of the ten topics that are most essential for scholars 

in the field of historical research who aim to publish their data set in accordance with the 

FAIR principles. In historical research, research data mostly consists of databases 

(spreadsheets, relational databases), text corpora, images, interviews, sound recordings or 

video materials. 

Things 

Findable 

To ensure that data sets can be found, scholars need to deposit their data sets and all the 

associated metadata in a repository which assigns persistent identifiers. 

Thing 1: Data repositories 

Data repositories enable researchers to share their data sets. The following data repositories 

accept data sets in the field of history: 

• DANS EASY 

• Figshare 

• Zenodo 

• B2SHARE 

A number of additional data repositories can be found by going to re3data.org, and by 

clicking on Browse > Browse by subject > History 

https://librarycarpentry.org/Top-10-FAIR
https://twitter.com/kristinahettne
https://twitter.com/pverhaar?lang=en
https://www.universiteitleiden.nl/en/staffmembers/ben-companjen
https://www.universiteitleiden.nl/en/staffmembers/ben-companjen
https://www.universiteitleiden.nl/en/staffmembers/laurents-sesink#tab-1
https://www.universiteitleiden.nl/en/staffmembers/fieke-schoots#tab-1
https://orcid.org/0000-0001-8888-635X
https://www.lumc.nl/org/humane-genetica/medewerkers/rajaram-kaliyaperumal?setlanguage=English&setcountry=en
https://openmethods.dariah.eu/erzsebet-toth-czifra/
https://www.linkedin.com/in/ricardo-de-miranda-azevedo-b0b95b26/
https://www.universiteitleiden.nl/en/staffmembers/sanne-muurling#tab-1
https://easy.dans.knaw.nl/ui/home
https://figshare.com/
https://zenodo.org/
https://b2share.eudat.eu/records/new
https://www.re3data.org/


 36 

Choosing a repository that complies with the CoreTrustSeal criteria for long term 

repositories is recommended. This way, the durable findability of the data is guaranteed. 

ACTIVITIES: 

1. Study the data set that can be found via https://doi.org/10.17026/dans-zw3-fkxb. How can 

the dataset be downloaded? Which formats are available? 

Thing 2: Metadata 

Once a certain data repository has been selected, the data set can be submitted, together with 

the metadata describing this data set. Metadata is commonly described as data about data. In 

the context of data management, it is structural information about a data set which describes 

characteristics such as the quality, the format and the contents. Most repositories require a 

minimum set of metadata, such as name of the creator, the title and the year of creation. 

Check what kind of metadata the repository you choose asks. Remember that the effort you 

put into metadata will contribute to the findability of your dataset. 

Metadata are often captured using a fixed metadata schema. A schema is a set of fields which 

can be used to record a particular type of information. The format of the metadata is often 

prescribed by the data repository which will manage the data set. 

ACTIVITIES: 

1. Read the Digital Scholarship @ Leiden blog to learn about metadata for humans and 

machines 2. Log in at Zenodo.org and click on Upload > New Upload. On the web page that 

appears, take stock of the various metadata fields that need to be completed. Zenodo is an 

international repository. Different countries and institutions might have other preferred 

repositories, such as DANS EASY. DANS EASY list the following specific requirements for 

historical sciences: Historical sciences: 1) a description of the (archival) sources; 2) the 

selection procedure used; 3) the way in which the sources were used; and 4) which standards 

or classification systems (such as HISCO) were used. Read more at 

https://dans.knaw.nl/en/deposit/information-about-depositing-data/before-depositing 

Thing 3: Persistent identifiers 

Datasets need to be deposited in repositories that assign persistent identifiers (PIDs) to ensure 

that online references to publications, research data, and persons remain available in the 

future. A PID is a specific type of a Uniform Resource Identifier (URI), which is managed by 

an organisation that links a persistent identification code with the most recent Uniform 

Resource Locator (URL). 

Academic journals mostly work with DOIs. DOIs are globally unique identifiers that provide 

persistent access to publications, datasets, software applications, and a wide range of other 

research results. DOI has been an ISO standard since 2012. A typical DOI looks as follows: 

https://www.coretrustseal.org/
https://doi.org/10.17026/dans-zw3-fkxb
https://digitalscholarshipleiden.nl/articles/metadata-4-machines-help-you-find-and-reuse-relevant-research-data
https://zenodo.org/
https://easy.dans.knaw.nl/
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier


 37 

http://doi.org/10.17026/dans-x4b-uy8q. When users click on this DOI, the DOI is resolved to 

an actual web address. 

Next to identifiers for data sets and for publications, it is also possible to create PIDs for 

people. Open Researcher and Contributor Identifier (ORCID) is an international system for 

the persistent identification of academic authors. It is a non-proprietary system, managed by 

an international consortium consisting of universities, national libraries, research institutes 

and data repositories. When your research results are associated with an ORCID, this 

information can be exchanged effectively across databases, across countries and across 

academic disciplines. You always retain full control over your own ORCID id. It is the de 

facto standard when submitting a research article or grant application, or depositing research 

data. 

ACTIVITIES: 

1. Watch the video “Persistent identifiers and data citation explained” by Research Data 

Netherlands. 2. Watch the video “What are persistent identifiers” for an example on how 

they are used in digital heritage. 2. If you don’t have one, request an ORCID. Add all your 

information as completely as possible. 3. Read Alice Meadow’s blog post Six Things to do 

now you have an ORCID iD. 4. Go to a data record and click on the DOI to see how the DOI 

can be resolved to current URL of the data set: http://dx.doi.org/10.17026/dans-x4b-uy8q. 5. 

Read “Digital Object Identifier (DOI) System for Research Data”. 

Accessible 

Thing 4: Open data 

The FAIR principles stipulate that data and metadata ought to be “retrievable by their 

identifier using a standardised communication protocol” (requirement A1). This requirement 

does not necessarily imply that the data should fully be available in open access. It 

principally means that there needs to be a protocol that users may follow to obtain of the 

data set. There can be many good reasons for limiting the access to a file. Public accessibility 

may be difficult because of privacy laws or copyright protection regulations, for example. 

The accessibility of the data may occasionally be complicated by the fact that the data have 

been stored using a so-called proprietary format, i.e. a format that owned exclusively by a 

single company. For formats which are associated with specific software applications, it can 

be difficult to guarantee their long-term usability, accessibility and preservation. For this 

reason, the DANS EASY archive in The Netherlands works with a list ‘preferred formats’. 

ACTIVITIES: 

1. Read the article on the website of DANS about preferred formats, and about what you can 

do to improve the durability of non-preferred formats. 2. Read the web page on open data on 

the ANDS website. 3. Consider the following three articles. To what extent can the data sets 

http://doi.org/10.17026/dans-x4b-uy8q
https://orcid.org/
https://www.youtube.com/watch?v=PgqtiY7oZ6k
https://www.youtube.com/watch?v=AUvMLzdgB3Y&feature=youtu.be
https://orcid.org/register
http://orcid.org/blog/2015/07/23/six-things-do-now-you%E2%80%99ve-got-orcid-id
http://orcid.org/blog/2015/07/23/six-things-do-now-you%E2%80%99ve-got-orcid-id
http://dx.doi.org/10.17026/dans-x4b-uy8q
https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-Identifiers.pdf
https://www.go-fair.org/fair-principles/542-2/
https://dans.knaw.nl/en/deposit/information-about-depositing-data/before-depositing/file-formats
https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/open-data
https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/open-data


 38 

that are mentioned in the articles be accessed? Are the data sets also in preferred formats? * 

https://doi.org/10.1080/0969594X.2016.1194257 * 

http://dx.doi.org/10.1371/journal.pone.0139563 * http://doi.org/10.1111/lang.12172 4. Look at 

the data set that can be found via https://doi.org/10.17026/dans-x5u-usxj. What is needed to 

access the data? 

Interoperable 

Thing 5: Data structuring and organisation 

Well-structured and well-organised data can evidently be reused much more easily. This 

section explains how researchers can organize their data in such a way that they can be 

analysed effectively with data science tools. Many historians capture their data in 

spreadsheets. As is explained by Broman and Woo (2018), there are a number of important 

principles to bear in mind when you work with spreadsheets. 

• It is important to be consistent. Terminology should be used invariably. 

• Avoid empty cells. Use a consistent code for data which is unavailable, such as ‘NA’ used 

in R. 

• Use a regular format for dates, such as YYYY-MM-DD. 

• Use all cells to capture atomic data. Do not place multiple values in a single cell. Every 

value that you may want to use in calculations or in other analyses needs to be available 

separately. 

• Organise the data as a single rectangle (with subjects as rows and variables as columns, 

and with a single header row) 

• Do not make use with colours to indicate properties of data.. Represent all data that you 

need as actual values in the spreadsheet. 

• Do not include calculations in the raw data files. 

Once you have developed a suitable data model, you are also advised to develop a data 

dictionary which documents the model. This document may contain the following 

information: 

• A list of all the column names used in the data spreadsheet 

• A description of the purpose and the contents of these different columns. 

• If applicable, give an indication of the units of measurement. 

• If applicable, describe the measures that have been taken to ensure the correctness and 

the consistency of the data 

 
• Explain abbreviations or notational conventions that have been used in the data set. 

https://doi.org/10.1080/0969594X.2016.1194257
http://dx.doi.org/10.1371/journal.pone.0139563
http://doi.org/10.1111/lang.12172
https://doi.org/10.17026/dans-x5u-usxj


 39 

ACTIVITIES: 

Read Karl Broman and Kara H. Woo, “Data organization in spreadsheets”. 

Thing 6: Controlled vocabularies and ontologies 

Tim Berners-Lee, the inventor of the Web, argued that there are five levels of open data. 

Creators of data can earn five stars by following the steps below. 

1. Data sets can be awarded one star if it has been made public. This is clearly the case for 

data which have been published via an open license in a data repository. 

2. In order to win a second star, the open data needs to be made available as machine-

readable data. This criterion can be satisfied by providing access to an Excel 

Spreadsheet, for instance. 

3. One disadvantage of an Excel spreadsheet is that users need proprietary software to open 

the data. The third star can be awarded to datasets which are captured using open 

formats, such as CSV or TXT. 

4. A fourth star can be awarded when the entities in the data set are identified using 

persistent identifiers. Such PIDs have the effect that other researcher can effectively 

link to the data set. 

5. The fifth star can be earned by linking the data to entities in other data sets via PIDs. 

When researchers have published their well-structured and their well-organised data set in a 

data repository via a public license, as explain in things 1 to 5 above, they will have arrived at 

data set that can be awarded three stars, according to Berners-Lee’s scheme. This section and 

the following section will further explain how you enhance the interoperability of their data 

sets even further by working with RDF and with persistent identifiers. 

As a first step, it can be useful to explore whether some of the general topics that you focus 

on have already been assigned persistent identifiers or URIs. Many researchers and 

institutions have developed shared vocabularies and ontologies to standardise terminology. 

In many cases, the terms which have been defined have also been assigned persistent 

identifiers. Such shared vocabularies can make it clear that we are talking about the same 

thing when we exchange knowledge. 

Historical research often concentrates on people, events, organisations and locations. The 

following ontologies and shared vocabularies concentrate on entities such as these: 

• The CIDOC Conceptual Reference Model (CRM) concept search. 

• Wikidata assigns identifiers to a wide range of entities, including people, locations and 

organisations 

• The Library of Congress name authority files, e.g. 

http://id.loc.gov/authorities/names/n79021400. 

https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989
https://5stardata.info/en/
http://www.cidoc-crm.org/concept-search
http://id.loc.gov/
http://id.loc.gov/authorities/names/n79021400


 40 

• VIAF (Virtual International Authority File (https://viaf.org/) 

• Identifiers for book published in Dutch or in the Netherlands can be found via the 

STCN, whose contents is available as Linked Open Data. 

• The UNESCO history thesaurus. 

• Aspects of books can be described using terms from the Bibliographic Ontology and the 

FABIO ontology. 

• GeoNames defined persistent identifiers to locations, e.g. 

https://www.geonames.org/2751773/leiden.html. 

• TaDiRAh and BARTOC (Basel Register of Thesauri, Ontologies & Classifications also 

offer valuable overviews of the ontologies that have been developed within specific 

disciplines. 

• One of the ways to describe the provenance of data sets is by so-called nanopublications, 

i.e. a set of Resource Description Framework (RDF) triples (subject-predicate-object 

tuples). Although you do not need nanopublications to describe provenance, 

Nanopublications are a way of combining argument and provenance in a single package. 

Nanopublications rely on the Provenance Ontology to express provenance. You can 

read more about them and their application in historical research in this paper by 

Patrick Golden and Ryan Shaw: Nanopublication beyond the sciences: the PeriodO 

period gazetteer 

Where possible, try to use terms that have been defined in these existing ontologies in your 

own data set. An example where a specific vocabulary (the VOC glossary) was used to 

markup a dataset can be found here. The dataset is part of a project to reconstruct the 

domestic market for colonial goods in the Dutch Republic. 

ACTIVITIES: 

1. Try to find one or two terms that are relevant to your research using the resources that are 

mentioned above. You can aso use Swoogle to search for vocabularies related to your 

research. 2. Search for a term related to your research in the CIDOC Conceptual Reference 

Model (CRM) concept search. Were you able to find it? Tip 1: Search for “person” to get an 

idea of how the thesaurus works. Tip 2: All the terms used can be found in the last release of 

the model: http://www.cidoc-crm.org/get-last-official-release. 

Thing 7: FAIR data modelling 

The fourth and the fifth star in Berner Lee’s model can be awarded when the data are stored 

in a format in which the topics their properties and their characteristics are identified using 

URIs whenever possible. More concretely, it implies that you record your data using the 

Resource Description Framework (RDF) format. RDF, simply put, is a technology which 

enables you to publish the contents of a database via the web. It is based on a simple data 

model which assumes that all statements about resources can be reduced to a basic form, 

https://viaf.org/
http://openvirtuoso.kbresearch.nl/sparql
http://vocabularies.unesco.org/browser/thesaurus/en/page/concept302
http://bibliontology.com/
https://sparontologies.github.io/fabio/current/fabio.html
https://www.geonames.org/
https://www.geonames.org/2751773/leiden.html
http://tadirah.dariah.eu/vocab/index.php
https://bartoc.org/
https://peerj.com/articles/cs-44/
https://peerj.com/articles/cs-44/
http://resources.huygens.knaw.nl/vocglossarium/index_html_en
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:75648
http://swoogle.umbc.edu/2006/
http://www.cidoc-crm.org/concept-search
http://www.cidoc-crm.org/get-last-official-release


 41 

consisting of a subject, a predicate and an object. RDF assertions are also known as triples. In 

a FAIR data model, elements of data are organised and identified using PIDs. The same goes 

for the relations between these elements. The FAIR data model is a graphical view of the 

data that act as a metadata key to a spreadsheet but it can also be used as a guide to expose 

data as a linked data graph in RDF format. 

Existing data sets can be converted to RDF by making use of the FAIRifier software. This 

application is based on OpenRefine. Other examples of tools to generate RDF are Karma and 

RML. In the FAIRifier, it is possible to upload a CSV file. After this, the data set can be 

connected to elements from existing ontologies. 

ACTIVITIES: 

1. Learn about the basics of RDF modeling by going through the first 15 slides of the webinar 

about the UNESCO Thesaurus. 2. Dig in deep by exploring the FAIRifier for a dataset you 

already have available in CSV. 

Reusable 

Thing 8: Licensing 

A license describes the conditions under which your data or software is (re)usable. Picking a 

license can be a daunting process because of the common feeling that if you do not pick the 

right license something will go wrong. However keep in mind that if you do not choose a 

license for your data or software, it means that it cannot be used or reused. A copyright 

expert can help you, but to get you going you can try out the activities listed below. 

ACTIVITIES: 

1. Try to pick a license for a data set you are working on by using the Creative Commons 

license picker 2. Try to pick a license for a piece of software or code you are working on by 

using the choose a license picker 3. Learn more about licensing your data by reading this 

guide from the Digital Curation Center 

If you deposit your data in a repository there will be default options available. 

Thing 9: Data citation 

When you have made use of someone else’s data, you are strongly recommended to attribute 

the original creators of these data by including a proper reference. Data sets, and even 

software applications, can be cited in the same way as textual publications such as articles 

and monographs. Structured data citations can also be used to calculate metrics about the 

reuse of the data. Data citations, regardless of citation style, typically contain the authors, the 

year, the title, the publisher and a persistent identifier. 

https://github.com/DTL-FAIRData/FAIRifier/wiki
http://openrefine.org/
http://usc-isi-i2.github.io/karma/
http://rml.io/
http://dublincore.org/resources/training/ASIST-Webinar-20160518/webinar-en.pdf
http://vocabularies.unesco.org/browser/thesaurus/en/
https://github.com/DTL-FAIRData/FAIRifier/wiki
https://creativecommons.org/choose/
https://creativecommons.org/choose/
https://choosealicense.com/
http://www.dcc.ac.uk/resources/how-guides/license-research-data
http://www.dcc.ac.uk/resources/how-guides/license-research-data


 42 

ACTIVITIES: 

1. Read the ANDS guide on data citation. 2. Read the FORCE11 data citation principles. 3. 

Study the following data set on figshare: https://doi.org/10.6084/m9.figshare.3519755.v1. 

Note that there is the possibility to generate a data citation, under the link “Cite”, in the 

citation style of your choice. 4. Consider the following publication: 

https://doi.org/10.1371/journal.pone.0149621. Note that the article has a “data availability” 

statement. 5. Explore CiteAs by typing in the figshare doi from above 

(10.6084/m9.figshare.3519755.v1). 

Context 

Thing 10: Policies 

Policies for data availability can come from publishers, funders and universities. These 

policies are listed on the respective website, but finding these is not always straightforward. 

FAIRsharing is a repository for standards, databases and policies with the possibility to filter 

on information for a specific research domain. It started as an initiative for the life sciences 

but is rapidly expanding its content for other disciplines as well. 

ACTIVITIES: 

1. Start by going to FAIRsharing 2. Click on the blue “Policies” button at the top 3. In the left 

side menu under “Subjects”, click on “show more” and select “Humanities”. 4. Scroll down to 

the Taylor and Francis Data Policy 5. Which databases and standards are mentioned in this 

policy? 6. Go to the specific policy for the “European Review of History” journal. 7. Does it 

differ from the general Taylor and Francis policy? 8. Try to find the data policy for your 

favorite journal. 

 
https://www.ands.org.au/__data/assets/pdf_file/0005/724334/Data-citation.pdf
https://www.force11.org/datacitationprinciples
https://doi.org/10.6084/m9.figshare.3519755.v1
https://doi.org/10.1371/journal.pone.0149621
http://citeas.org/
https://fairsharing.org/
https://fairsharing.org/bsg-p000115/
https://www.tandfonline.com/action/authorSubmission?journalCode=cerh20&page=instructions&#dsp


 43 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Geoscience 
 

Sprinters: 

John Brown, Janice Chan, Niamh Quigley (Curtin University, Perth, Western Australia) 

Audience: 

Researchers 

Things 

Findable 

Thing 1: Data sharing and discovery 

Thing 6: Vocabularies for data description 

Thing 7: Identifiers and linked data 

Thing 10: Spatial data 

Accessible 

Thing 2: Long-lived data: curation & preservation 

Thing 3: Data citation for access & attribution 

Thing 4: DOIs and citation metrics 

Interoperable 

Thing 4: DOIs and citation metrics 

Thing 6: Vocabularies for data description 

Thing 7: Identifiers and linked data 

Thing 9: Exploring APIs and Apps 

https://librarycarpentry.org/Top-10-FAIR
https://staffportal.curtin.edu.au/staff/profile/view/John.Brown
https://github.com/icecjan/


 44 

Reusable 

Thing 5: Licensing data for reuse 

Thing 8: What are publishers & funders saying about data? 

Thing 1: Data sharing and discovery 

Activity 1: Data discovery 

Data repositories enable others to find existing data by publishing data descriptions 

("metadata") about the data they hold, much like a library catalogue describes the resources 

held in a library. Also, repositories often provide access to the data itself and some even 

provide ways for users to explore that data. Many research funding requirements reference 

researchers depositing their data into data repositories (which we’ll discuss later in Thing 8). 

Data portals or aggregators draw together research data records from a number of 

repositories. Because of the huge amounts of data available they sometimes focus on data 

from one discipline or geographic region. The EU Open Data Portal is an example that 

aggregates metadata records from over 30 European national data repositories and The US 

Government’s Open Data portal data.gov aggregates from over 100 US government agencies. 

1. Look at this Data.gov.au record from Geoscience Australia: Lord Howe Rise Marine 

Survey 2017. 

• Examine the Description and Additional Info fields to see the ways that Geoscience 

Australia has made this record findable to other researchers. If you knew about this data 

portal, would you be able to easily find this dataset if it was relevant to your research? 

2. Spend a few minutes exploring the Scottish Spatial Data Infrastructure Metadata Portal. 

• Try browsing or searching on a topic of interest. 

• Explore a record and see where it came from and if there’s a way to contact the creator. 

• Have a look at the map and see if you can find and add a map layer relating to fishing. 

3. Look at EarthChem. 

• Have a look at some of the data in EarthChem. Would it be a good place to contribute 

the data from your own research? 

Consider: If your research appeared in the right data portal or repository, what things might 

result from that for yourself? What about your discipline? 

Activity 2: Finding data repositories 

1. Choose one of the specialised data repositories below, or find another data repository on 

re3data.org (perhaps one outside your particular focus area) and spend some time 

browsing around your chosen repository to get a feel for the data available. 

https://data.europa.eu/euodp/en/home
https://www.data.gov/
https://data.gov.au/
https://data.gov.au/dataset/lord-howe-rise-marine-survey-2017-ga-0363-kr17-15c-bathymetry-grids
https://data.gov.au/dataset/lord-howe-rise-marine-survey-2017-ga-0363-kr17-15c-bathymetry-grids
https://www.spatialdata.gov.scot/geonetwork/srv/eng/catalog.search#/home
https://www.spatialdata.gov.scot/geonetwork/srv/eng/catalog.search#/map
http://www.earthchem.org/
http://www.earthchem.org/data/templates
http://www.earthchem.org/data/templates
https://www.re3data.org/browse/by-subject/


 45 

• WorldClim 

• Southern California Earthquake Centre 

• MOPITT (Atmospheric Science Data Centre) 

• International Service of Geomagnetic Indices 

• Scientific Drilling Database 

• Alberta Geological Survey 

2. Think about how the data here differs from data you are familiar with, for example, in 

format, size and access method. 

Consider: Could you apply a dataset from one of these repositories to your own work? Would 

you need to change file formats or learn a new software package? 

Thing 2: Long-lived data: curation & preservation 

Activity 1: Preserving born digital objects 

Information sources that were commonly used in the past such as maps and handwritten 

observation notes and can easily survive for years, decades or even centuries. However, 

because most current research is done mostly on computers, it’s important to remember that 

digital items require special care to keep them usable over time. 

1. This video (2.5 min) from the US Library of Congress shows the vulnerability of “born 

digital” objects like research data: they are fragile; they are dependent on software and 

hardware; and they require active management. 

2. Look at the ANDS page on file formats. 

Consider: If your research was put into a time capsule and unearthed in 50 years' time, would 

future researchers be able to determine if your research is still useful to them? If you were 

allowed to update the time capsule every 5 years, what would you change to make it easier 

for those unearthing it? 

Activity 2: Readme files 

One way that researchers can ensure their data is useful in the future is to package their data 

with an explanation that can be opened without any software. These explanatory files mean 

that anyone who finds the data will know if the data is useful to them and hopefully won’t 

have any questions for the original researcher, who may not be available or not remember. 

The files are usually called “readme” files in the hope that by reading the file, all the 

important questions will be answered. 

1. Read the Guide to writing “readme” style metadata from the Cornell Research Data 

Management Service Group and create a readme.txt file for one of your own datasets. 

http://worldclim.org/
https://www.scec.org/
https://eosweb.larc.nasa.gov/project/mopitt/mopitt_table
http://isgi.unistra.fr/index.php
http://www.scientificdrilling.org/
https://geology-ags-aer.opendata.arcgis.com/
http://www.bl.uk/learning/timeline/large126360.html
http://4.bp.blogspot.com/-A2pzziCrac0/VhmMGIgnDQI/AAAAAAAAFA8/FKK0e5veaJA/s1600/IMG_9095.JPG
http://4.bp.blogspot.com/-A2pzziCrac0/VhmMGIgnDQI/AAAAAAAAFA8/FKK0e5veaJA/s1600/IMG_9095.JPG
https://youtu.be/qEmmeFFafUs
https://www.ands.org.au/working-with-data/data-management/data-preservation
https://data.research.cornell.edu/content/readme


 46 

Don’t forget to include notes on software versions used, methodology and any special 

things you’d tell a colleague if you were giving them the data yourself! 

Thing 3: Data citation for access & attribution 

Activity 1: Citing research data 

When authors cite an article they have used ideas from, they formally and publicly 

acknowledge the work of the earlier author. Data citation works in the same way – by citing 

the data created by earlier researchers they get formal and public credit for their 

contribution to the new work. Along with books, journals and other scholarly works, it is 

now possible to formally cite research datasets and even the software that was used to create 

or analyse the data. 

1. Have a look at https://www.bgs.ac.uk/services/ngdc/citedData/catalogue/a59128b5-8e7f-

4100-b0ff-87325438435b.html the Geophysical, hydraulic and mechanical properties of 

synthetic versus natural sandstones under variable stress conditions dataset from the 

British Geological Survey. If someone wanted to use this dataset for further research, 

would they know how to give credit to the creator of the original dataset? 

2. Find a DOI of a dataset from one of the repositories you found in Thing 1 and enter it 

into the DOI Citation Formatter: https://citation.crosscite.org/. If you saw the citation, 

would you know how to go about accessing the data? 

3. Read the article, “Sharing Detailed Research Data Is Associated with Increased Citation 

Rate” – why would it be that papers that make their data openly available get better 

citation counts? Would you feel more confident citing another person’s work if you 

knew? 

Consider: Data citation is a relatively new concept in the scholarly landscape and as yet, is 

not routinely done by researchers, or demanded by journals. What could be done to 

encourage routine citation of research data and software associated with research outputs? 

Activity 2: Citing software 

The increase in available computational power over the last 50 years has led to a massive 

increase in the usage of computational analysis methods in geoscience. 

As such techniques become more commonplace, it’s important to distinguish between the 

data itself, the tools used to analyse data and any discrete components within those tools. In 

some cases, a particular function of the software is critical to the analysis process; in other 

cases the critical part is an interchangeable block of code within that software package. 

Recognising the difference between these two is important as it changes who gets credit for 

their previous work and who gets left unsung. 

https://www.bgs.ac.uk/services/ngdc/citedData/catalogue/a59128b5-8e7f-4100-b0ff-87325438435b.html
https://www.bgs.ac.uk/services/ngdc/citedData/catalogue/a59128b5-8e7f-4100-b0ff-87325438435b.html
https://www.bgs.ac.uk/services/ngdc/citedData/catalogue/a59128b5-8e7f-4100-b0ff-87325438435b.html
https://www.bgs.ac.uk/services/ngdc/citedData/catalogue/a59128b5-8e7f-4100-b0ff-87325438435b.html
https://citation.crosscite.org/
https://doi.org/10.1371/journal.pone.0000308
https://doi.org/10.1371/journal.pone.0000308


 47 

It’s not always easy to know which to cite, but trying to give recognition for the creation of 

software and software components can make huge impacts on the career of a researcher, 

especially if they create scientific software! 

1. Read https://libguides.mit.edu/c.php?g=551454&p=3900280 the How to cite software 

guide from the MIT Libraries. 

2. Read Adding CITATION to your R package blog post. 

Consider: If you wrote a package of code for a computer program to run and made it freely 

available to your colleagues to solve a problem in your field, would they know how they 

could give you credit in their work? Would they think that you would want attribution? 

Thing 4: DOIs and citation metrics 

DOIs are unique identifiers that enable data citation, metrics for data and related research 

objects, and impact metrics. Citation analysis and citation metrics are important to the 

academic community. Find out where data fits in the citation picture. 

Activity 1: DOIs 

Digital Object Identifiers (DOIs) are a type of ‘persistent identifier’. DOIs are unique 

identifiers that provide persistent access to published articles, datasets, software versions and 

a range of other research inputs and outputs. There are over 120 million Digital Object 

Identifiers (DOIs) in use, and in 2016 DOIs were “resolved” (clicked on) over 5 billion times! 

Each DOI is unique but a typical DOI looks like this: 

http://dx.doi.org/10.4225/06/577F022BA6954 

1. Start by watching this short 4.5-minute video Persistent identifiers and data citation 

explained from Research Data Netherlands. It gives you a succinct, clear explanation of 

how DOIs underpin data citation. 

2. Have a look at the poster Building a culture of data citation and follow the arrows to see 

how DOIs are attached to data sets and are used in data citation. 

3. Let’s go to a Research Data Australia data record which shows how DOIs are used. Click 

on this DOI to ‘resolve’ the DOI and take us to the record: 

http://dx.doi.org/10.4225/06/577F022BA6954. 

4. Click on the Cite icon on the upper left of the record (under the green Access the data 

tab). No matter where the DOI appears it always resolves back to its original dataset 

record to avoid duplication. i.e. many records, one copy. 

5. DOIs can also be applied to grey literature, a term that refers to research that is either 

unpublished or has been published in non-commercial form, such as government 

reports. For example, reports like this: http://doi.org/10.4225/06/583d354b89060. 

https://libguides.mit.edu/c.php?g=551454&p=3900280
https://libguides.mit.edu/c.php?g=551454&p=3900280
https://libguides.mit.edu/c.php?g=551454&p=3900280
https://www.r-bloggers.com/adding-citation-to-your-r-package/
http://dx.doi.org/10.4225/06/577F022BA6954
https://youtu.be/PgqtiY7oZ6k
https://www.ands.org.au/__data/assets/pdf_file/0003/383025/data_citation_poster.pdf
http://dx.doi.org/10.4225/06/577F022BA6954
http://doi.org/10.4225/06/583d354b89060


 48 

Activity 2: IGSNs 

International Geo Sample Number (IGSN) are designed to provide an unambiguous globally 

unique persistent identifier for physical samples. It facilitates the location, identification, and 

citation of physical samples used in research. 

Each IGSN is unique but a typical IGSN looks like this IEEVB00C3. The first five characters 

of the IGSN represent a name space (a unique user code) that uniquely identifies the person 

or institution that registers the sample. The last 4 characters of the IGSN are a random string 

of alphanumeric characters (0-9, A-Z). 

1. Start by reading this brief introduction to IGSN. 

2. Review the scope and capability of each IGSN allocation agent listed on the IGSN 

website and consider which allocation agent is most appropriate for your samples. 

3. Have a look at an IGSN record https://app.geosamples.org/sample/igsn/IEEVB00C3 

which displays what information about the sample was recorded. 

4. Now have a look at how IGSNs are referenced in a dataset record 

http://get.iedadata.org/doi/100548. 

Consider: How are you managing your physical samples? The ANDS IGSN minting service 

may be used by Australian researchers at no cost. Do you know of a service provider in your 

region? 

Activity 3: Altmetrics 

Data citation best practice, as discussed in Thing 3, enables citation metrics for data to be 

tracked and analysed. Data citations are available from the Clarivate Data Citation Index 

which is a commercial product. 

Altmetrics is an alternative measure to help understand the influence of your work. It refers 

to metrics such as number of views, number of downloads, number of mentions in policy 

documents, social media, and social bookmarking platforms associated with any research 

outputs that have a DOI or other persistent identifiers. Because of their immediacy, 

altmetrics can be an early indicator of the impact or reach of a dataset; long before formal 

citation metrics can be assessed. 

1. Start by looking at the altmetrics for this phylogenomics article published in Science. 

Note the usage statistics, including number and pattern of downloads, for this article 

since it was published in November 2014. 

2. Now click on the “donut” or the link to ‘See More Details’ to see the wealth of 

information available. 

https://app.geosamples.org/sample/igsn/IEEVB00C3
https://www.ands.org.au/working-with-data/citation-and-identifiers/igsn
http://www.igsn.org/register-your-samples
http://www.igsn.org/register-your-samples
https://app.geosamples.org/sample/igsn/IEEVB00C3
http://get.iedadata.org/doi/100548
http://www.sciencemag.org/articleusage?gca=sci;346/6210/763


 49 

3. Look also at the associated data in Dryad noting that the data has been assigned a DOI. 

Can you see how many times the data has been downloaded and the record viewed 

(scroll down to the bottom of the record)? 

By way of comparison, as of early November 2018: * the same dataset had been cited once in 

Web of Science Data Citation Index * the article had been cited 690 times in Web of Science 

Consider: Do you think altmetrics for data have value in academic settings? Why, or why 

not? 

Thing 5: Licensing data for reuse 

Understand the importance of data licensing, learn about Creative Commons and find out 

how enabling reuse of data can speed up research and innovation. 

Activity 1: Why license research data? 

Consider this scenario: You’ve found a dataset you are interested in. You’ve downloaded it. 

Excellent! But do you know what you can and cannot do with the data? The answer lies in 

data licensing. Licensing is critical to enabling data to be reused and cited. 

1. Start by reading this brief introduction to licensing research data. 

2. Now watch this Creative Commons Licensing introductory video or have a closer look 

at the Understanding CC Licences poster. 

3. Check out the licence chooser from Creative Commons, which walks you through the 

decision of which licence is appropriate for your purpose. 

Consider: If you were considering licensing a dataset on something which may have 

commercial value to others - what licence would you apply? 

Activity 2: Data licences: unlock data for innovation 

Enabling reuse of data can speed up research and innovation. Licensing is critical to enabling 

data reuse. 

1. Start by watching this 4.30mins video in which Dr Kevin Cullen from the University of 

New South Wales explains their approach to licensing which aims to strengthen the 

University’s relationship with business and industry. 

2. Check out the data standards of Geoscience Australia, which refers to the Australian 

government policy on Public Data. Which Creative Commons licence is applied to 

government data by default? 

3. Since November 2009, Geoscience Australia has officially adopted Creative Commons 

Attribution as the default licence for its website. That means thousands of products and 

datasets available through the website are free to be reused. 

http://datadryad.org/resource/doi:10.5061/dryad.3c0f1
http://apps.webofknowledge.com/full_record.do?product=WOS&search_mode=GeneralSearch&qid=2&SID=E4hcr2sIg7gEPv5OcTf&page=1&doc=3
https://www.ands.org.au/working-with-data/publishing-and-reusing-data/licensing-for-reuse
https://youtu.be/FsTO6ink3oI
https://www.ands.org.au/__data/assets/image/0008/774296/CCposter.png
https://creativecommons.org/choose/
https://youtu.be/LmyzF7iJp3E?list=PLG25fMbdLRa7QH8_yyNSgzkQOTBVsTK2r
http://www.ga.gov.au/data-pubs/datastandards
https://pmc.gov.au/resource-centre/public-data/australian-government-public-data-policy-statement
http://creativecommons.org.au/blog/2009/12/more-on-government-data-geoscience-australia-goes-cc/
http://creativecommons.org.au/blog/2009/12/more-on-government-data-geoscience-australia-goes-cc/


 50 

4. See the range of data products and license available at British Geological Survey. 

Does your institution have policies or guidelines around data licensing? 

Activity 3: Data licensing in practice 

Not all research data that is shared is licensed for reuse. It should be! 

1. Explore the following data repositories: 

• Research Data Australia 

• AuScope Geonetwork Portal 

• EarthChem 

2. Or review the following example records: 

• Darwin Harbour marine habitats 

• Mineral Occurrences - South Australia 

• Whole Rock Composition Data for Garnet Pyroxenites from Arizona 

3. Do all data repositories or metadata catalogues enable users to refine search by licenses? 

Look closely at the specific Licensing information on a small sample of those records 

with ‘open’ licences. How easy or difficult it is to work out if the data can or can’t be 

reused e.g. for commercial purposes? with international collaborators? 

Consider: Assigning Open Licenses is not routine. Suggest one tip for encouraging uptake of 

'open' licensing. 

Thing 6: Vocabularies for data description 

In addition to selecting a metadata standard or schema, whenever possible you should also 

use a controlled vocabulary. 

Activity 1: What is controlled vocabulary? 

A controlled vocabulary provides a consistent way to describe data - location, time, place 

name, subject. Read this short explanation of controlled vocabularies. 

Controlled vocabularies significantly improve data discovery. It makes data more shareable 

with researchers in the same discipline because everyone is ‘talking the same language’ when 

searching for specific data e.g. plants, animals, medical conditions, places etc. 

If you have time, have a look at Controlling your Language: a Directory of Metadata 

Vocabularies from JISC in the UK. Make sure you scroll down to 5. Conclusion - it’s worth a 

read. 

https://www.bgs.ac.uk/data/licensing/home.html
https://researchdata.ands.org.au/
http://portal.auscope.org/geonetwork
http://www.earthchem.org/
https://researchdata.ands.org.au/darwin-harbour-marine-habitats/685223
http://portal.auscope.org/geonetwork/srv/eng/catalog.search;jsessionid=9239F91DD91D546FB5229E2CF054A033#/metadata/37334122424b82f003cd4d88d0877ec45d2b4c35
http://get.iedadata.org/doi/111138
https://stats.oecd.org/glossary/detail.asp?ID=6260
http://www.webarchive.org.uk/wayback/archive/20160101151732/http:/www.jiscdigitalmedia.ac.uk/guide/controlling-your-language-links-to-metadata-vocabularies
http://www.webarchive.org.uk/wayback/archive/20160101151732/http:/www.jiscdigitalmedia.ac.uk/guide/controlling-your-language-links-to-metadata-vocabularies


 51 

Activity 2: Controlled vocabularies in action 

We are going to see some controlled vocabularies in action in the Atlas of Living Australia 

(ALA). 

1. Do a search in the ALA search engine. Type “whale” in the search box and click on 

search. Choose one of the records listed and click on the (red text) View record link. 

2. Any metadata field where you see Supplied... tells you that the information supplied by 

the person who submitted the record (often a 'citizen scientist') has been changed to the 

controlled vocabulary being used in metadata fields e.g. Observer, Record date and 

Common name. 

3. Have a scroll down the record and consider how many of the metadata fields probably 

have a controlled vocabulary in use (e.g. taxonomy, geospatial etc.). 

If you have time: have a browse around the stunning level of data description and data 

contained in the Atlas of Living Australia. 

Activity 3: Geoscience vocabularies 

Explore some examples of vocabularies used in geoscience: 

• American Geosciences Institute GeoRef Thesaurus 

• Geological Survey of Western Australia Geoscience Thesaurus (GeMPeT) 

• Geosciences Australia vocabularies register 

• British Geology Society Vocabularies 

Consider: Do you use controlled vocabularies to describe your data? How would you 

encourage other researchers to use them? 

Thing 7: Identifiers and linked data 

ORCID is a unique identifier for researchers. Many research data repositories record your 

ORCID when you submit research data for publication. 

Activity 1: Check your ORCID 

In your ORCID record, datasets you have published will be displayed in the Works section. 

Log into ORCID now and check your details are up to date, including: * email address * 

biography * research keywords * other IDs such as Scopus Author ID. 

If you don’t already have an ORCID you can get one, this Curtin University webpage has 

information on how to get the most out of your ORCID. 

https://www.ala.org.au/data-sets/
http://www.ala.org.au/
https://www.americangeosciences.org/georef/georef-thesaurus-lists
http://www.dmp.wa.gov.au/Geoscience-Thesaurus-GeMPet-1564.aspx
http://ldweb.ga.gov.au/def/voc/ga/
https://www.bgs.ac.uk/data/vocabularies/home.html
https://orcid.org/
https://orcid.org/signin
http://libguides.library.curtin.edu.au/c.php?g=202410&p=6157895


 52 

Activity 2: Get more from your ORCID 

ORCID populates your ORCID record from many sources, one of which is peer review 

activities. Publishers such as the American Geophysical Union Publications now send details 

of peer review activities to ORCID. 

• Look at your ORCID record, if you have undertaken peer review activities are they 

listed? 

• Why do you think linking peer review activities to ORCIDs could be useful? 

Activity 3: Identifiers and linked data 

Because they are unique identifiers, ORCIDs can be used to link data from different datasets 

together. GeoLink is a network of Linked Data from multiple data repositories. 

1. Go to the portal for the GeoLink demo. 

2. Choose an entity e.g. Datasets, Cruises, Vessels, Instruments, Researchers and explore! 

The Help guide is here. 

Thing 8: What are publishers & funders saying about data? 

Geoscience research data is a world heritage. Researchers share the responsibility with 

research institutions and funders of ensuring their data is well-documented, preserved and 

openly available. 

Many publishers have special requirements for the citation of data in publications. This can 

be in the form of compliance with a data policy, author guidelines or the completion of a 

Data Availability Statement. 

Activity 1: Research data and scholarly publishing 

Have a look at the Nature Data Availability Statement examples or the PLOS Data 

Availability policy to get an idea of what publishers expect. 

COPDESS is The Coalition for Publishing Data in the Earth and Space Sciences, and they 

have collected links to author instructions and data policies for some geoscience journals, 

publishers and funders. 

Activity 2: Research funders and data sharing 

Activity 1 has shown us that it’s becoming more common for journals and publishers to 

demand your data be made available when you seek to publish. However, if your research is 

publicly funded it’s almost guaranteed that your grant and funding obligations with require 

you to make your data publicly available at the end of your project – the outputs of research 

funded by a population should be made available to that population. 

https://eos.org/agu-news/agu-opens-its-journals-to-author-identifiers
https://eos.org/agu-news/agu-opens-its-journals-to-author-identifiers
http://demo.geolink.org/
http://demo.geolink.org/help/index.html
https://sciencepolicy.agu.org/files/2013/07/AGU-Data-Position-Statement-Final-2015.pdf
https://www.nature.com/authors/policies/data/data-availability-statements-data-citations.pdf
https://journals.plos.org/plosone/s/data-availability
https://journals.plos.org/plosone/s/data-availability
http://www.copdess.org/datapolicies/


 53 

The Australian Research Council’s data management requirements states that funded 

researchers are expected to follow the OECD Principles and Guidelines for Access to 

Research Data from Public Funding. Similar principles are outlined by the UK Research and 

Innovation (UKRI) in their Guidance on best practice in the management of research data 

document. 

Consider: If you were on a funding panel and were asked to assess a grant with a clear plan 

for making the data openly available, would you rate the future impact of that proposal 

better or worse than one with a poorly defined plan? 

Thing 9: Exploring APIs and applications 

Geosciences has many specialised services, applications and APIs which can be used to 

directly access and harness existing research data. Some are free, and some are subscription-

based, but your research institution may have access. 

Activity 1: Try an app 

• The WA Geology app created by the Western Australian government, can be used in a 

mobile web browser and provides multiple layers of geoscience information for Western 

Australia. 

• The British Geological Survey has created the free iGeology app to explore hundreds of 

British maps. 

Activity 2: APIs 

APIs (Application Programming Interfaces) are software services that allow you to access 

structured data or systems held by someone else. These are usually provided so that 

developers can access data held by an organisation on demand, rather than them having to 

hold an entire dataset (which may not be possible due to security, space requirements or if 

the dataset is constantly changing). Some companies charge for using their APIs, but many 

research-oriented organisations provide their APIs for free so that other organisations can 

link in to their knowledge. 

• The NASA Earth Data Developer Portal provides data from the NASA Earth Science 

Data portal. 

 
• The Natural History Museum API provides a range of data from their collections. 

Consider: If you could systematically access and integrate the data provided from one of the 

sources above, can you think of a way you could enrich the outputs of your own research? 

https://www.arc.gov.au/policies-strategies/strategy/research-data-management
http://www.oecd.org/sti/inno/38500813.pdf
http://www.oecd.org/sti/inno/38500813.pdf
https://www.ukri.org/about-us/
https://www.ukri.org/about-us/
https://www.ukri.org/files/legacy/documents/rcukcommonprinciplesondatapolicy-pdf/
http://www.dmp.wa.gov.au/Geology-mapping-app-for-mobile-1567.aspx
https://www.bgs.ac.uk/igeology/
https://developer.earthdata.nasa.gov/
https://earthdata.nasa.gov/
https://earthdata.nasa.gov/
http://data.nhm.ac.uk/about/download
http://www.nhm.ac.uk/our-science/collections.html


 54 

Thing 10: Spatial data 

The importance of spatial data is ever increasing. Many of the societal challenges we face 

today such as food scarcity and economic growth are inherently linked to big spatial data. In 

fact, it is often said that 80% of all research data has a geographic or spatial component. It is 

useful then, for all of us to have an understanding of spatial data. 

Activity 1: Spatial data: Maps and more 

1. Start by watching this incredible, inspiring video (3.59 min) from the University of 

Wollongong’s PetaJakarta project. It shows innovative ways of combining social media 

and geospatial data to save lives. 

2. Now read The Application of Geographic Information Science in Earth Sciences. 

3. This video combines a range of different data visualisations depicting the human 

impacts on our environment. 

4. Geospatial data is fundamental to Australia’s economic future. Check out this very short 

article about how GeoScience Australia is mapping the mineral potential of our 

continent - a world first! 

Just for fun: Enter your address in the Atlas of Living Australia and see what birds and plants 

have been reported in your street or suburb. You may be surprised at how ‘alive’ your street 

is! 

Consider: Why do you think these geospatial visualisations are so powerful? 

Activity 2: Spatial data concepts 

There are many types and sources of geospatial data. If you are new to the world of geospatial 

data, you will probably appreciate some ‘busting’ of the jargon of geospatial data. 

1. Start by reading this Fundamentals Chapter to learn more about maps, projections, 

coordinate systems, datums and GIS. 

2. Want more? Continue with this blog about Finding and Making Sense of Geospatial 

Data on the Internet which explains some basic geospatial data file formats and 

concepts. 

3. Prefer watching? Most of these concepts are also explained in this video. 

4. Read more about two important aspects of spatial data: scale and resolution. 

Consider: How would you give an explanation of two new terms you have just learnt? 

Activity 3: Using and visualising spatial data 

Spatial data can be used in many ways, and there are many tools that you can use to 

manipulate and display spatial data. 

https://www.youtube.com/watch?v=6v7BO8_rhWI&feature=youtu.be
https://gis.usc.edu/blog/the-application-of-geographic-and-information-science-gis-in-earth-sciences/
http://spatial.ly/2013/11/climate-change-state-science/
http://spatial.ly/2013/11/climate-change-state-science/
http://www.ga.gov.au/news-events/news/latest-news/continental-scale-mapping-of-mineral-potential-wins-top-award
https://biocache.ala.org.au/explore/your-area
http://vcgi.vermont.gov/sites/vcgi/files/training/chapter_1.pdf
https://blog.openshift.com/finding-and-making-sense-of-geospatial-data-on-the-internet/
https://blog.openshift.com/finding-and-making-sense-of-geospatial-data-on-the-internet/
https://www.youtube.com/watch?v=lelnsbJ7VWo&t=28s
http://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/cell-size-of-raster-data.htm


 55 

You can try one of the tools below. Do one, or do them all and compare the results. 

1. 13 Free GIS Software Options: Map the World in Open Source 

• Browse through this site for ideas for free, open source geospatial software; the 

descriptions often include discipline specific advice. Download one and try your hand at 

mapping. 

2. Spatial data visualisation with R: For those who have done the R modules in Software 

Carpentry - this might be a good activity to flex your R muscles! Want more? Here are 

some more R tutorials. 

3. Create a map using Google Fusion Tables: This offers lots of features, but you need a 

Google account. The excellent Google Fusion tutorial uses butterfly data to show you 

how to import data, map the data and customise your map. 

The Open Geospatial Consortium (OGC) is an international not-for-profit organization that 

develops open standards for the geospatial community. OGC through their dedicated global 

members have developed several standards to share geospatial data. Some of the most 

commonly use standards are: 

1. Web Map Service (WMS): A standard web protocol to query and access geo-registered 

static map images as a web service. The outputs are images that can be displayed in a 

browser application. 

2. Web Feature Service (WFS): A standard web protocol to query and extract geographic 

features of a map, these are typically attributes of a map. The latest version of WFS (3.0, 

Dec 2017) has created a lot of excitement in the community. 

3. Web Coverage Service (WCS): Provides access to geospatial information representing 

phenomena that are variable over space and time, such as satellite images or aerial 

photos. The service delivers a raster image that can be further interpreted and processed. 

Geoserver is the most popular open source reference implementation of WMS, WFS and 

WCS standards. 

Consider: The data world is hungry for Geospatial tools and metadata and there is growing 

demand for people with these skills. How can these skills be encouraged in your institution? 

References: 

ANDS 23 (Research Data) Things https://www.ands.org.au/working-with-data/skills/23-

research-data-things/all23 

10 Eco Data Things https://www.ands.org.au/__data/assets/pdf_file/0003/1376121/10-Eco-

Data-Things_handout.pdf 

https://gisgeography.com/free-gis-software/
https://www.r-bloggers.com/spatial-data-visualization-with-r-2/
https://www.researchgate.net/publication/274697165_Spatial_data_visualisation_with_R
http://pakillo.github.io/R-GIS-tutorial/
https://support.google.com/fusiontables/answer/2527132?hl=en&ref_topic=2592806
http://www.opengeospatial.org/
http://www.opengeospatial.org/standards/wms
http://www.e-cartouche.ch/content_reg/cartouche/webservice/en/html/wfs_whatWFSis.html
https://medium.com/@cholmes/wfs-3-0-get-excited-yes-8e904fdbcc0
http://www.opengeospatial.org/standards/wcs
http://geoserver.org/
https://www.ands.org.au/working-with-data/skills/23-research-data-things/all23
https://www.ands.org.au/working-with-data/skills/23-research-data-things/all23
https://www.ands.org.au/__data/assets/pdf_file/0003/1376121/10-Eco-Data-Things_handout.pdf
https://www.ands.org.au/__data/assets/pdf_file/0003/1376121/10-Eco-Data-Things_handout.pdf


 56 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Biomedical Data Producers, Stewards, and 

Funders 
 

Sprinters: 

Lisa Federer (National Library of Medicine), Douglas Joubert (National Institutes of Health 

Library), Allissa Dillman (National Center for Biotechnology Information), Kenneth Wilkins 

(National Institute of Diabetes and Digestive and Kidney Diseases), Ishwar 

Chandramouliswaran (National Institute of Allergy and Infectious Diseases), Vivek Navale 

(NIH Center for Information Technology), Susan Wright ( National Institute on Drug Abuse) 

Audience: 
• Biomedical researchers 

• Data stewards 

• Funding organizations 

Things 

Thing 1: Metadata creation and curation 

Beginner activity: 

1. Learn about the various types of metadata. DataOne defines metadata as "documentation 

about the data that describes the content, quality, condition, and other characteristics of 

a dataset. More importantly, metadata allows data to be discovered, accessed, and 

reused" - DataONE Education Module. 

• Descriptive 

• Technical 

• Administrative 

• Provenance 

2. Work through the DataOne Metadata Educational Module: Lesson 7 - Metadata. 

3. Explore the use of controlled vocabularies and Common Data Elements (CDE). A CDE is 

a "data element that is common to multiple data sets across different studies." The NIH 

Common Data Element (CDE) Resource Portal has identified CDEs for use in particular 

types of research or research domains after a formal evaluation and selection process. 

• Take the NIH CDE interactive tour to learn how to use the site. 

https://librarycarpentry.org/Top-10-FAIR
https://github.com/informationista
https://github.com/doujouDC
https://twitter.com/dchackathons
https://www.niddk.nih.gov/about-niddk/staff-directory/biography/wilkins-kenneth
https://www.linkedin.com/in/ishwarc/
https://www.linkedin.com/in/ishwarc/
https://www.rd-alliance.org/users/vivek-navale
https://www.drugabuse.gov/about-nida/organization/divisions/division-basic-neuroscience-behavioral-research-dbnbr/office-director-od
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6018669/
https://www.dataone.org/education-modules
https://www.dataone.org/education-modules
https://www.nlm.nih.gov/cde/glossary.html#cdedefinition
https://www.nlm.nih.gov/cde/
https://cde.nlm.nih.gov/home?tour=yes


 57 

• Browse the CDEs to explore how these might be used in your discipline. 

Intermediate activity: 

1. Think about ways you can standardize minimal/core metadata to use across disciplines. 

For example, crosswalk between standards). 

2. Automated metadata creation can "help improve efficiency in time and resource 

management within preservation systems, and alleviate the problems associated to the 

"metadata bottleneck". 

3. Review the Digital Curation Centre (DCC) Automated Metadata Generation primer 

page. 

4. Download the DCC Digital Curation Reference Manual and think about the ways you 

might be able to automate metadata creation at your organization. 

5. Watch the ALCTS Session 1: Automating Descriptive Metadata Creation: Tools and 

Workflows webinar which examines workflows for automating the creation of 

descriptive metadata. 

Thing 2: Use of standard data models 
1. Explore the OMOP Common Data Model (CDM), which allows for the systematic 

analysis of disparate observational databases. 

2. Review one of the OMOP Community Meeting presentations and think about how this 

might align to the work of your organization. 

3. Familiarize yourself with one of the Observational Health Data Sciences and Informatics 

GitHub repositories. 

Thing 3: Exploring unique, persistent identifiers 

Beginner activity: 

Globally unique and persistent identifiers remove ambiguity in the meaning of your 

published data by assigning a unique identifier to every element of metadata and every 

concept/measurement in your dataset (GOFAIR) 

1. Explore the GO FAIR F1 webpage to see examples of globally unique and persistent 

identifiers. 

2. Learn how a Digital Object Identifier (DOI) can be used to create a unique reference to 

your data. Watch a video that explain what DOIs are and how they work, and how they 

benefit managers of digital content. 

3. Read the Digital Preservation Handbook to learn about all of the elements that comprise 

a persistent identifier. 

https://cde.nlm.nih.gov/cde/search
https://www.ands.org.au/online-services/rif-cs-schema/crosswalks-transform-your-metadata
http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/automated-metadata-extraction
http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/automated-metadata-extraction
http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/automated-metadata-extraction
http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/automated-metadata-extraction
http://www.dcc.ac.uk/webfm_send/1513
http://www.ala.org/alcts/events/ac/2016/vc-sess1
http://www.ala.org/alcts/events/ac/2016/vc-sess1
https://www.ohdsi.org/data-standardization/the-common-data-model/
https://www.ohdsi.org/resources/presentations/community-meeting-presentations/
https://github.com/OHDSI
https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/
https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/
http://www.doi.org/driven_by_DOI.html
https://www.dpconline.org/handbook/technical-solutions-and-tools/persistent-identifiers


 58 

Intermediate Activity: 

ORCID allows you to create persistent digital identifiers for authors. 

1. Create an ORCID ID. 

2. Link your ORCID with CrossRef and DataCite. 

3. Then, go through steps included in the Getting Started with ORCID Integration guide. 

4. Test the ORCID Application Programming Interface (API). 

5. As a best practice, use ORCIDs from the start of data creation. For example, you can 

attach data creator name/ORCID to dataset as a metadata field. Include ORCIDs with 

datasets in repositories (e.g. in Sequence Read Archive (SRA), include the ORCID for 

the data creator). This allows for the tracking of your research and enables citation of 

your data. 

Thing 4: Versioning and data "retirement" 

Beginner activity: 

A source-code repository is a file archive and web hosting facility where a large amount of 

source code, for software, web pages, and other resources, is kept, either publicly or 

privately. Advantages of versioning include: 

1. Persistence of identifiers pointing to different/earlier versions 

2. Maintaining previous versions of code, software, and data. 

3. Sharing various levels of processed data (primary, secondary, or raw/clean/processed, 

etc.). 

4. De-accessioning of data that has reached the end of its life cycle 

Intermediate activity: 

1. GitHub is one of the most popular options for code hosting. Explore alternative options 

for code hosting. 

2. Work through the Library Carpentry Introduction to GitHub module. 

Thing 5: Linking research objects 

Beginner activity: 

1. Read the following article on managing digital research objects. 

2. Read the linking data CrossRef page. 

Intermediate activity: 

1. Using a (Github code repository or Zenodo), try to find data that goes with a published 

paper. Then answer some of the following questions: 

https://orcid.org/
https://orcid.org/register
https://orcid.org/members/001G000001C8dNEIAZ-crossref
https://orcid.org/members/001G000001G9QIUIA3-datacite
https://members.orcid.org/api/getting-started
https://orcid.org/content/register-client-application-sandbox
https://www.ncbi.nlm.nih.gov/sra
https://en.wikipedia.org/wiki/Comparison_of_source-code-hosting_facilities
https://en.wikipedia.org/wiki/List_of_most_popular_websites
https://opensource.com/article/18/8/github-alternatives
https://librarycarpentry.org/lc-git/
https://datascience.codata.org/articles/10.5334/dsj-2018-016/
https://www.crossref.org/community/linking-data/
https://github.com/
https://zenodo.org/


 59 

• Where is the data or code stored (for example, Github repo or Zenodo)? 

• Who created the objects (ORCID)? 

• Was there proper documentation? License information (regarding commercial use)? 

Thing 6: Human and machine readability 
1. Read about the FAIR principles for making your code both human and machine 

readable, and the FAIR Guiding Principles article. 

2. Read the following report Jointly designing a data FAIRPORT from the Lorentz Center. 

3. Having code that is both human and machine readable supports: 

 
• API access 

• Allows for automatic integration of multiple datasets 

• Use of standard formats widely accepted in the discipline 

Thing 7: Maintain/preserve entire research environment (e.g. 

software) 
1. Familiarize yourself with best practices for scientific computing. Read Good Enough 

Practices in Scientific Computing, and Top 10 Metrics for Life Science Software Good 

Practices to familiarize yourself with the topics of containers, software preservation, and 

software emulation. 

2. Read more about the Long-term preservation of biomedical research data. 

Thing 8: Indexing repositories to enable findability 
1. re3data.org is a global registry of research data repositories that covers research data 

repositories from different academic disciplines. Register your dataset with re3data.org 

2. Schema.org is a collaborative, community activity with a mission to create, maintain, 

and promote schemas for structured data. 

• Read their Getting Started Guide to get indexed with Google. 

3. Link your ORCID account to fairsharing.org, verify your email address, and create a 

public profile. 

• Familiarize yourself with their Standards, Databases, Policies, and Collections. 

Thing 9: Broad consent 

Informed consent for human subjects should be broad enough to make reuse possible. See 

Broad Consent for Research with Biological Samples: Workshop Conclusions. Also see, 

Recommendations for Broad Consent Guidance from the Office for Human Research 

Protections. 

https://github.com/
https://zenodo.org/
https://www.force11.org/fairprinciples
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/
https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
https://f1000research.com/articles/5-2000/v1
https://f1000research.com/articles/5-2000/v1
https://f1000research.com/articles/7-1353/v1
https://www.re3data.org/about
https://schema.org/
https://schema.org/docs/gs.html
https://fairsharing.org/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4791589/
https://www.hhs.gov/ohrp/sachrp-committee/recommendations/attachment-c-august-2-2017/index.html


 60 

Thing 10: Application of metrics to evaluate the FAIRness of (data) 

repositories 

Beginner activity: 

1. Explore the work of the FAIR Metrics Group. Explore their proposed FAIR Metrics. 

2. Read the following paper: Evaluating FAIR-Compliance Through an Objective, 

Automated, Community-Governed Framework. 

3. Explore the design framework for exemplar metrics for FAIRness. 

Intermediate activity: 

1. Explore the Make Data Count Project, where you can learn about COUNTER Code of 

Practice as well as the Code of Practice for Research Data Usage Metrics. 

2. Learn how Zenodo and DataONE have responded to the Make Data Count 

recommendations. 

 
http://fairmetrics.org/
https://github.com/FAIRMetrics/Metrics
https://www.biorxiv.org/content/biorxiv/early/2018/09/16/418376.full.pdf
https://www.biorxiv.org/content/biorxiv/early/2018/09/16/418376.full.pdf
https://www.nature.com/articles/sdata2018118
https://makedatacount.org/
https://www.projectcounter.org/code-of-practice-sections/general-information/
https://www.projectcounter.org/code-of-practice-sections/general-information/
https://peerj.com/preprints/26505/
http://blog.zenodo.org/2018/07/18/2018-07-18-usage-statistics/
https://www.dataone.org/news/new-usage-metrics


 61 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Biodiversity 
 

Sprinters: 

Silvia Di Giorgio, Akinyemi Mandela Fasemore, Konrad Förstner, Till Sauerwein, Eva 

Seidlmayer (ZB MED - Information Center for Life Science, Cologne, Germany), Ilja Zeitlin, 

Susannah Bacon, Chris Erdmann (Library Carpentry/The Carpentries,/California Digital 

Library) 

Audience: 

Researchers 

Things 

Findability 

Thing 1: Identifiers 

To make data findable, it has to be uniquely and persistently stored with an identifier. 

• A digital object identifier (DOI) is a unique, case-insensitive, alphanumeric character 

sequence and can be very helpful for this purpose. You can reach the identified digital 

object by using the DOI as a URL. Just fill in the DOI in the address bar (e. g. 

https://doi.org/10.1109/5.771073). Also, see: ANDS Guide: Digital Object Identifier 

(DOI) System for Research Data. 

NOTE: 

The distributing DataCite-agency (i.e. issues DOIs) for Life Sciences is PUBLISSO: 

https://www.publisso.de/wir-fuer-sie/doi-service/ 

Exercise: 

For easy look up, we have a list of DOIs below. Can you match the right document to the 

appropriate DOI? HINT: Start from here https://www.doi.org/! 

1. 10.1103/PhysRev.48.73 

2. 10.5962/bhl.title.28875 

• On the origin of species 

• The Particle Problem in the General Theory of Relativity 

https://librarycarpentry.org/Top-10-FAIR
https://twitter.com/digiorgiosilvia
https://sea-region.github.com/fasemoreakinyemi
https://twitter.com/konradfoerstner
https://twitter.com/TillSauerwein
https://sea-region.github.com/EvaSeidlmayer
https://sea-region.github.com/EvaSeidlmayer
https://rd-alliance.org/users/ilja-zeitlin
https://twitter.com/ardcsbacon
https://twitter.com/libcce
https://doi.org/10.1109/5.771073
https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-Identifiers.pdf
https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-Identifiers.pdf
https://www.publisso.de/wir-fuer-sie/doi-service/
https://www.doi.org/


 62 

Which of these is not a valid DOI? 

1. 10.1037/arc0000014 

2. 12.1093/fMicb.2018.00257 

3. 10.1101/468025 HINT: Check the prefix (before the forward slash)! 

Which part indicates the publishing institution? The prefix or the suffix of a DOI? 

ORCID Exercise: ORCID is a self-identifier for authors to avoid author name ambiguity. Use 

ORCIDs from the start of data creation, i.e. attach data creator name/ORCID to dataset as a 

metadata field. Include ORCIDs with datasets in repositories (e.g. in Sequence Read Archive 

(SRA), include the ORCID for the data creator). This allows for the tracking of data 

provenance (the origins, custody, and ownership of research data). 

Go through the Getting Started with ORCID Integration. 

Thing 2: Citations 

Zenodo, for example, is a tool that makes scientific data and publications easier to cite. 

It supports various data and license types. It also supports source code from GitHub 

repositories. 

See https://zenodo.org/ 

Exercise: 

* Use the Zenodo Sandbox to upload an example dataset, software program, etc. 

https://sandbox.zenodo.org/ 

Questions: 

1. Which metadata fields do you have to add when uploading data and why? 2. Which fields 

are mandatory and which ones are not? 3. What identifiers can you use? 

https://members.orcid.org/api/getting-started
https://zenodo.org/
https://sandbox.zenodo.org/


 63 

 
Uploading to Zenodo (Sandbox) 

Thing 3: Wikidata 

Wikidata provides a common source of open data which can be used by Wikimedia projects 

such as Wikipedia, and by anyone else, under a public domain license. 

Exercise: 

Go to Wikidata and find the publication date of the book “On the origin of species”. 

• Switch over to the linked dataset of the author of the book and see his other 

publications. 

• What did he publish in 1839? 

Thing 4: Registry of Research Data Repositories (re3data) 

This project aims to accelerate scientific discovery and enhance the integrity, transparency, 

and reproducibility of data. To enable FAIR data sharing, data need to be deposited in a 

repository that is taking steps to make data as open and FAIR as possible. It's not clear-cut 

what is FAIR at this time, there is no such thing as a FAIR stamp - although the 

CoreTrustSeal certification provides a good indication. Therefore, under the auspices of the 

Enabling FAIR Data Project, American Geophysical Union (AGU), re3data, and DataCite, 

https://www.wikidata.org/wiki/Wikidata


 64 

these organisations have decided to develop new tools to assist researchers with finding an 

appropriate repository for their data: 

• Browse Subject Repositories 

• Repository Finder 

Exercise: 

1. How many entries are returned for the query specific for your research topic on re3data? 

2. If you filter under "Subject", what do you find? 3. Do you think something is missing from 

the results? If so, suggest a repository. 

Try the "browse by Subject" entry to the re3data-database since this gives a great overview 

on the wide landscape of research data repositories: https://www.re3data.org/browse/by-

subject/ 

Accessibility 

Thing 5: Bioschemas 

bioschemas.org aims to improve data interoperability in the life sciences. It does this by 

encouraging people in the life sciences to use schema.org markup, so that their websites and 

services contain consistently structured information (metadata). This structured information 

then makes it easier to discover, collate and analyse distributed data. 

Exercises can be found on the Bioschema website under "tutorials" and "how to". 

• https://bioschemas.gitbook.io/training-portal/ 

Thing 6: Licenses 

Knowing the appropriate licenses to use for your data can help others understand how they 

can use your data and can also help with improving accessibility. 

• Open Source Licenses 

• Data and Creative Commons licenses 

• How to License Research Data 

Exercise: 

1. Use the Creative Commons License Tool to select the appropriate license with the 

following intentions; 2. Allow your work to be adapted and also allow it to be used 

commercially. 

https://www.re3data.org/browse/by-subject/
https://repositoryfinder.datacite.org/
https://www.re3data.org/
https://www.re3data.org/suggest
https://www.re3data.org/browse/by-subject/
https://www.re3data.org/browse/by-subject/
http://bioschemas.org/
https://schema.org/
https://bioschemas.gitbook.io/training-portal/
https://opensource.org/licenses
https://wiki.creativecommons.org/wiki/Data_and_CC_licenses
http://www.dcc.ac.uk/resources/how-guides/license-research-data
https://creativecommons.org/choose


 65 

Thing 7: Availability via torrents 

The era of Big Data is finally upon us. A prerequisite for accessibility is availability. Well 

established sharing protocols like torrents will ensure data are perpetually available without 

the constraint of time and space. Using the torrent protocol for scientific data will lead to 

some of the below advantages: 

• Immutability 

• Distribution capabilities (lower cost for distributing the data) 

• No sole maintainer (we don’t have to rely only on one specific maintainer because data 

can be cloned and maintained across the peer-networks) 

The Magnet URI scheme defines the format of magnet links, a de facto standard for 

identifying files by their content, via cryptographic hash value rather than by their location. 

Using Magnet URI scheme directly on the publication will make all the data accessible. For 

more information, read: 

• Academic Torrents 

• Magnet URI Scheme 

Exercise: 

1. Upload any small data set of your choice with the above link. 2. Share with a colleague a 

link to access it over torrent. 

Interoperability: 

Thing 8: ELIXIR platforms 

Standardisation of life science data will ensure interoperability across different sub fields. 

ELIXIR is an intergovernmental organisation that brings together life science resources from 

across Europe. 

• ELIXIR Interoperability Platform 

Exercise: 

Use the ELIXIR software bio.tools to find the author of the RNA-seq python pipeline 

"READemption". 

Thing 9: Research data management 

Bio2RDF is a large network of linked data for the life sciences. The database provides 

interlinked life science data using semantic web technologies. To learn more about Bio2RDF, 

read Bio2RDF: towards a mashup to build bioinformatics knowledge systems. 

http://academictorrents.com/
https://en.wikipedia.org/wiki/Magnet_URI_scheme
https://www.elixir-europe.org/platforms/interoperability
https://bio.tools/
https://www.ncbi.nlm.nih.gov/pubmed/18472304


 66 

• http://bio2rdf.org/ 

The German Federation for Biological Data (GFBio) is the authoritative, national contact 

point for issues concerning the management and standardisation of biological and 

environmental research data during the entire data life cycle (from acquisition to archiving 

and data publication). GFBio mediates expertises and services between the GFBio data 

centers and the scientific community, covering all areas of research data management. 

• https://www.gfbio.org/ 

Thing 10: Machine-readability 

Make the data accessible via an API, in a structured data format that can be automatically 

read and processed by a computer. See the Open Data Handbook Glossary - Machine 

readable. 

Exercise - Crossref: 

1. Pick the DOI of a publication of your choice. 

2. Open a Web browser and add the URL. 

3. https://api.crossref.org/works/DOI <= replace DOI with the DOI of the publication. 

Example: https://api.crossref.org/works/10.1371/journal.pcbi.1004668 

Exercise - DataCite: 

1. Pick the DOI of a dataset in Zenodo. 

2. Open https://api.datacite.org/works/DOI <= replace DOI with the DOI of the Zenodo 

entry. 

Example: https://api.datacite.org/works/10.5281/zenodo.1574835 

Reusability 

Thing 11: Digitalization 

If the methods to record complex experiments are prone to error, so that reproducible results 

cannot be guaranteed, how can you ever be sure you’re dealing with real insights and not 

random information? The electronic lab notebook provides the missing infrastructure for 

data recording, retrieval and integrity. An electronic lab notebook must be able to create, 

import, store and retrieve all important data types in digital format. For more information, 

read: 

• Kanza, Samantha et al. “Electronic lab notebooks: can they replace paper?” Journal of 

cheminformatics vol. 9,1 31. 24 May. 2017, doi:10.1186/s13321-017-0221-3 

http://bio2rdf.org/
https://www.gfbio.org/
http://opendatahandbook.org/glossary/en/terms/machine-readable/
http://opendatahandbook.org/glossary/en/terms/machine-readable/
https://api.crossref.org/works/DOI
https://api.crossref.org/works/10.1371/journal.pcbi.1004668
https://api.datacite.org/works/DOI
https://api.datacite.org/works/10.5281/zenodo.1574835
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5443717/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5443717/


 67 

• Electronic Lab Notebook Matrix 

Exercise: 

Explore the demo lab notebook at https://demo.elabftw.net/experiments.php 

Thing 12: Containers 

In a scientific field, most of the time we have to deal with large amounts of data that have to 

be processed before publication. One important aspect of the reproducibility challenge is 

ensuring computational analysis can be reproduced, even in different environments. For 

more information, read: 

• Grüning, Björn, et al. "Practical computational reproducibility in the life sciences." Cell 

systems 6.6 (2018): 631-635. 

Exercise: 

Learn Docker & Containers using Interactive Browser-Based Scenarios: 

https://www.katacoda.com/courses/docker 

Thing 13: Blockchain for Life Science 

Blockchain technology has the potential to be a technical solution to the current 

reproducibility crisis in science, and could "reduce waste and make more research results 

true". See: 

• Mapping the blockchain for science landscape 

• Blockchain for science and knowledge creation: A technical fix to the reproducibility 

crisis? 

Living document example: 

See Blockchain for Open Science – the living document: 

https://www.blockchainforscience.com/2017/02/23/blockchain-for-open-science-the-living-

document/ 

Supplementary information: 

Research Data Infrastructure for the Life Sciences (NFDI4Life) 

NFDI4Life brings together research communities across the life sciences domain in the 

context of the planned National Research Data Infrastructure (NFDI). As a response to the 

increasing scientific and societal demand for data and data analysis, NFDI4Life brings 

together scientific communities and research data infrastructures broadly covering the life 

sciences with particular focus on the subdomains biology, medicine (with veterinary 

https://datamanagement.hms.harvard.edu/electronic-lab-notebooks
https://demo.elabftw.net/experiments.php
https://www.cell.com/cell-systems/pdf/S2405-4712(18)30140-6.pdf
https://www.cell.com/cell-systems/pdf/S2405-4712(18)30140-6.pdf
https://www.katacoda.com/courses/docker
https://hackernoon.com/mapping-the-blockchain-for-science-landscape-546b61bfbd1
https://zenodo.org/record/60223/files/ZenodoBlockchainforScienceKnowledgeCreation.pdf
https://zenodo.org/record/60223/files/ZenodoBlockchainforScienceKnowledgeCreation.pdf
https://www.blockchainforscience.com/2017/02/23/blockchain-for-open-science-the-living-document/
https://www.blockchainforscience.com/2017/02/23/blockchain-for-open-science-the-living-document/


 68 

medicine), epidemiology, nutrition, agricultural and environmental science as well as 

biodiversity research. 

• https://www.nfdi4life.de/ 

Carpentries Community 

The carpentries develops and teaches workshops on the fundamental data skills needed to 

conduct research. 

• https://carpentries.org/ 

Go-FAIR-Initiative 

GO FAIR follows a bottom-up open implementation strategy for the European Open Science 

Cloud (EOSC) as part of a broader global Internet of FAIR Data & Services. 

• https://eosc-portal.eu/ 

• https://www.go-fair.org/ 

FAIRDOM 

FAIRDOM supports researchers, students, trainers, funders and publishers to make their 

data, operating procedures and models, Findable, Accessible, Interoperable and Reusable 

(FAIR). 

• https://fair-dom.org/about-fairdom/ 

 
https://www.nfdi4life.de/
https://carpentries.org/
https://eosc-portal.eu/
https://eosc-portal.eu/
https://fair-dom.org/about-fairdom/


 69 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Australian Government Data/Collections 
 

Sprinters: 

Katie Hannan, Data Librarian (CSIRO), Richard Ferrers, Research Data Analyst (ARDC), 

Keith Russell, Manager Engagements (ARDC) 

FAIR data 

See ARDC image summarising what FAIR means; see also Force 11 definition. 

  
Figure 1; FAIR in a nutshell. Image: ARDC 2018 - CC-BY 4.0. 

Description: 

Governments have a mandate to make non-sensitive data open. For example, the Australian 

Government Public Data Policy Statement says “Australian Government entities will ... make 

non-sensitive data open by default...make high value data available for use by the public, 

https://librarycarpentry.org/Top-10-FAIR
http://orcid.org/0000-0002-5689-4133
https://twitter.com/valuemgmt
https://www.rd-alliance.org/users/kgrussell
https://www.ands.org.au/__data/assets/image/0011/1416098/FAIR-Data-image-map-graphic-v2-721px.png
https://www.force11.org/group/fairgroup/fairprinciples
https://www.pmc.gov.au/resource-centre/public-data/australian-government-public-data-policy-statement


 70 

industry and academia... ensure non-sensitive publicly funded research data is made open for 

use and reuse... to extend the value of public data for the benefit of the Australian public.” 

FAIR data is a way to extend the value of data. The largest 20 nations, the G20, agreed to 

make Open Data Principles a priority at the 2015 meeting in Turkey, saying “Transparency... 

Global transformation, facilitated by technology, fuelled by data and information.. Open data 

is at the center of this global shift.” (p.2). 

Audience: 

Government data custodians 

Goal: 

Help government data custodians to understand FAIR data principles 

NB: Nomenclature and data: 

Where “data” is used here, we also mean collections such as Cultural Collections, historical 

collections, documents, artefacts and other valuable collections. 

Table of contents 
1. Thing 1 - Why is data important? 

2. Thing 2 - Open data vs FAIR data 

3. Thing 3 - Data discovery (F) 

4. Thing 4 - Describing your data (FAI) 

5. Thing 5 - Identifiers (F) 

6. Thing 6 - Licensing (R) 

7. Thing 7 - Dirty Data (R) 

8. Thing 8 - Sensitive Data (A) 

9. Thing 9 - Vocabularies (I) 

10. Thing 10 - Data Impact (R) 

Things 

Thing 1: Why is data important? 

Read G20, Australian and States policies on Open Data 

http://www.g20.utoronto.ca/2015/G20-Anti-Corruption-Open-Data-Principles.pdf


 71 

 Figure 2; Data sharing drivers 

Source: Katie Hannan, 2018, CC-BY. 

Beginner activity: 

International 

G20: Open Government Forum; G20 Turkey 2015. “Transparency... Global transformation, 

facilitated by technology, fuelled by data and information.. Open data is at the center of this 

global shift.” (p.2) Read and consider G20 Open Data Principles. 

Familiarise yourself with your State or Territories Data Policy. See links in Appendix 1. 

Australia 

* Public data policy statement Office of the Australian Information Commissioner: Principles 

on open public sector information * Principle 1: Open access to information — a default 

position “information held by Australian Government agencies is a valuable national 

resource. If there is no legal need to protect the information it should be open to public 

access.” * Principle 3: Effective information governance “ensuring agency compliance with 

legislative and policy requirements on information management and publication” * Principle 

4: Robust information asset management * Principle 5: Discoverable and useable information 

“ensure that information published online is in an open and standards-based format and is 

machine-readable” “attach high quality metadata to information so that it can be easily 

located and linked to similar information using standard web search applications” * Principle 

6: Clear reuse rights “The economic and social value of public sector information is enhanced 

when it is made available for reuse on open licensing terms.” 

See Appendix 1 for a list of Australian State Open Data Policies. 

Intermediate activity: 

The following legislation may apply to the management of government data: 

mailto:katie.hannan@csiro.au
http://www.g20.utoronto.ca/2015/G20-Anti-Corruption-Open-Data-Principles.pdf
https://www.pmc.gov.au/public-data/public-data-policy
https://www.oaic.gov.au/information-policy/information-policy-resources/principles-on-open-public-sector-information
https://www.oaic.gov.au/information-policy/information-policy-resources/principles-on-open-public-sector-information


 72 

• Archives Act 1983 - https://www.legislation.gov.au/Details/C2016C00772 

• Freedom of Information - http://my.csiro.au/Support-Services/Legal/FOI.aspx 

• Privacy - http://my.csiro.au/Support-Services/Legal/Privacy-Law.aspx 

• Australian Government intellectual property rules - 

https://www.communications.gov.au/policy/policy-listing/australian-government-

intellectual-property-rules 

• Records Disposal Authority - An agency-specific records authority may have advice that 

you need to follow. Find your agency here - http://www.naa.gov.au/information-

management/records-authorities/types-of-records-authorities/Agency-RA/index.aspx 

 
• New Australian Government Sharing and Release Legislation (open for public comment, 

shows where legislation is going): 

https://www.pmc.gov.au/sites/default/files/publications/australian-government-data-

sharing-release-legislation_issues-paper.docx 

Advanced activity: 

If your organisation doesn’t have a policy on open data, who are the key stakeholders that 

you would need to work with to prepare an open data policy? 

What main headings would you need to include as part of your data policy? 

Thing 2: Open data vs FAIR data 

Read https://www.go-fair.org/faq/ask-question-difference-fair-data-open-data/ Can you 

think of examples of data you deal with that cannot be made Open but can be made FAIR? 

List some advantages in making this data FAIR. 

Does the current wording in the policy for Open Data encourage making the data FAIR? 

Where do you see gaps? 

See slide 14 here https://www.slideshare.net/sjDCC/open-fair-data-and-rdm 

Beginner activity: 

See how Geoscience Australia implement the FAIR data principles in their work. Geoscience 

Australia describe themselves as “the nation's trusted advisor on the geology and geography 

of Australia” (GA 2018). 

Advanced activity: 

How FAIR is your data? - https://www.ands-nectar-rds.org.au/fair-tool Suggest using this 

now, and then finishing off the modules, making some changes to a data collection and then 

testing again using the FAIR data tool. 


 73 

Thing 3: Data discovery 

• What’s a data repository? 

• What’s a data portal? 

• Where to find data? 

• Where to store data? 

• Data.gov.au (and search.data.gov.au!) - find - this is an aggregator See 

https://data.gov.au/dataset/list-of-australian-government-data-portals for a list of 

Australian Government Data Portals (current as of March 2017). Some other data portals 

appear on https://data.gov.au/harvest. 

• CSIRO DAP - find/store 

• National Map - find 

• Re3data.org - registry of research repositories (etc) 

International government data portals: 

• United Kingdom - https://data.gov.uk/ 

• New Zealand - https://www.data.govt.nz/ 

• Canada - https://open.canada.ca/en/open-data 

• United States of America - https://www.data.gov/ 

• India - https://data.gov.in/ 

• Finland - https://vm.fi/en/opendata 

 
• Singapore - https://data.gov.sg 

Thing 4: Describing your data or collection 
• Including a description of data. What should go in a description? 

• What makes a good description? See ANDS Content Providers Guide on descriptions -> 

Best Practice -> Writing good descriptions 

• Write the description for a reader who has a general familiarity with a research area but 

is not a specialist—this will make data more accessible for cross-disciplinary use. 

• Don't use specialist acronyms or obscure jargon. 

• Don't assume a reader has specialist knowledge. 

Some reusable content here - https://ecu.au.libguides.com/10-marine-science-rdm-

things/Thing6 

https://data.csiro.au/dap/home?execution=e1s1
https://nationalmap.gov.au/
https://documentation.ands.org.au/display/DOC/Description


 74 

Beginner activity: 

Read a data description on data.gov.au eg Arts Victoria, ABC or Research data Australia Eg 

National Archive of Australia, Australian Antarctic Data Centre, CSIRO (Commonwealth 

Scientific and Industrial Research Org), Geoscience Australia. 

Reflection: Could you understand the description? Can you think of someone for whom this 

data or collection would be useful? Was it clear where to go next to access the data, or to ask 

for more information about this data or collection? What else would you like to know about 

this data/collection? 

Activity: Post your questions or responses to the reflection above to: the data custodian, or 

the comments section at data.gov.au. 

Intermediate activity; 

If you are a data custodian/researcher, consider your five most important datasets, that you 

have contributed to or that you manage. Pick the most important dataset to describe. 

1. Start with: Title, Author, Year, Institution, Location/URL. This is the minimum 

description required to get a DOI (a permanent identifier). The URL for a DOI is the 

home page for the dataset description. If you don’t have one, make a person’s contact 

the URL. 

• (Hint: if you get stuck with the description, copy the abstract of a paper or conference 

paper or annual report, which uses or references your dataset. Edit the abstract to talk 

only about the data.) 

Q: What type of data identifier does a government data custodian have? 

2. Add more rich description to your data description eg subjects, grant IDs (where 

applicable - RDA; the Australian National Data Catalogue, has permanent URLs for 

Australian ARC and NHMRC grants). Include a significant statement about why the 

dataset is important. 

3. Ask a colleague in a related field if they can understand your description. This helps the 

description be broadly readable by someone who is not deeply knowledgeable in your 

field. This will ensure that your description is more broadly understood. 

Advanced activity: 

Publish your data description on your resume, especially if online e.g. LinkedIn. Send your 

data description to your data librarian, for addition to your Institutional Repository or Data 

Portal. Alternatively, post your description to a public cloud service, such as Zenodo, 

Figshare or Data Dryad. No data need be included. A description record is valuable in itself as 

it reveals the existence of data, previously unknown and inaccessible. 

https://www.data.gov.au/organization/artsvictoria
https://www.data.gov.au/organization/australianbroadcastingcorporation
https://researchdata.ands.org.au/contributors/national-archives-of-australia
https://researchdata.ands.org.au/contributors/australian-antarctic-data-centre
https://researchdata.ands.org.au/contributors/commonwealth-scientific-and-industrial-research-organisation
https://researchdata.ands.org.au/contributors/geoscience-australia
https://researchdata.ands.org.au/grants
https://www.linkedin.com/
https://zenodo.org/
https://figshare.com/
https://datadryad.org/


 75 

Thing 5: Identifiers 

To make data findable, It has to be uniquely and persistently stored with an identifier. A 

digital object identifier (DOI) is a unique, case-insensitive, alphanumeric character sequence 

and can be very helpful for this purpose. See also [ANDS Guide: Digital Object Identifiers 

(DOI) System for Research 

Data]](https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-

Identifiers.pdf). 

See who mints ANDS DOIs, including NSW Office of Heritage and Environment, Bureau of 

Meteorology, CSIRO, Geoscience Australia, Dept of Environment. 

Types of persistent identifiers: 

• DOI 

• Handle 

• IGSN 

Videos 

Watch the video Persistent identifiers and data citation explained by Research Data 

Netherlands - https://youtu.be/PgqtiY7oZ6k 

Read about persistent identifiers on a very general level (awareness). DOI requires five fields; 

author, title, year, publisher, URL of DOI landing page. 

Beginner activity: 

Visit http://www.doi.org/ and try resolving these DOI numbers: 

10.26179/5bf63428ea2a1 10.26186/5b76556b396c0 

Thing 6: Licensing 

See the licensing guide: what is the appropriate licence for data produced by a government 

agency? 

Refer to Australian Government Data Statement: “At a minimum, Australian Government 

entities will publish appropriately anonymised government data by default: ...under a 

Creative Commons By Attribution licence (ie CC_BY licence) unless a clear case is made to 
the Department of the Prime Minister and Cabinet for another open licence.” 

Specific CC licences, which require DPC approval, include NC - non-commercial, SA - share 

alike, and the very restrictive (and not-recommended ANDS) ND - no derivatives allowed. 

Examples of licensing statements: 

https://www.ands.org.au/guides/persistent-identifiers-awareness
https://www.ands.org.au/guides/research-data-rights-management
https://www.pmc.gov.au/public-data/public-data-policy


 76 

http://www.bom.gov.au/waterdata/index.shtml?selected=Copyright 

Thing 7: Dirty data 

Why is ”clean” data important? Public policy, changes to medical protocols and economic 

decisions all depend on accurate and complete data. See further at ECU resource which looks 

at the why and what of “dirty data.” 

https://ecu.au.libguides.com/10-marine-science-rdm-things/Thing10 

Beginner activity: 

Read this case study. The Data Retriever automates the tasks of finding, downloading, and 

cleaning up publicly available data, and then stores them in a variety of databases and file 

formats. This lets data analysts spend less time cleaning up and managing data, and more 

time analysing it. https://frictionlessdata.io/articles/the-data-retriever/ 

• Bad data guide - https://github.com/Quartz/bad-data-guide 

• Releasing data or statistics in spreadsheets - http://www.clean-sheet.org/ 

• How to share data with a statistician - https://github.com/jtleek/datasharing 

• A gentle introduction to data cleaning - 

https://schoolofdata.org/courses/#IntroDataCleaning 

• Tidy data for librarians - https://librarycarpentry.org/lc-spreadsheets/ 

Advanced activity: 

• Open refine - https://librarycarpentry.org/lc-open-refine/ 

• Clean Your Data: Getting Started with OpenRefine [video] - 

https://www.youtube.com/watch?v=wGVtycv3SS0 

Thing 8: Working with sensitive data 

What is sensitive data? 

FAIR data doesn’t need to be published as open data. See Thing 2. 

Reuse: https://www.ands.org.au/working-with-data/skills/23-research-data-things/10-

medical-and-health-things/m-and-h-thing-4 

Useful resource: CSIRO Data 61 The De-Identification Decision-Making Framework - 

https://publications.csiro.au/rpr/download?pid=csiro:EP173122&dsid=DS3 

Indigenous Knowledge: Issues for protection and management - 

https://www.ipaustralia.gov.au/sites/g/files/net856/f/ipaust_ikdiscussionpaper_28march2018.

pdf 


 77 

Additional resources (from Library-Research-Support-Top-10-FAIR-Things_DRAFT) 

• Despite being written for Human Research Ethics Committees, the ANDS Human 

Research Ethics Committees guide is a handy overview for people interested in making 

personal data FAIR: 

https://www.ands.org.au/__data/assets/pdf_file/0009/748737/HREC_Guide.pdf Key 

points: “.....” 

• NHMRC National Statement on Ethical Conduct of Human Research (2018) - CH3.1 

Element 4 https://nhmrc.gov.au/about-us/publications/national-statement-ethical-

conduct-human-research-2007-updated-

2018#element_4__data_collection_and_management Key points: “....” 

• Guiding Principles for Ethical Research (U.S National Institutes of Health) - 

https://www.nih.gov/health-information/nih-clinical-research-trials-you/guiding-

principles-ethical-research 

Thing 9: Vocabularies - Assisting with interoperability 

Beginner activity: 

Controlled vocabularies for data description 

In addition to selecting a metadata standard or schema, whenever possible you should also 

use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe 

data - location, time, place name, and subject. 

Controlled vocabularies significantly improve data discovery. It makes data more shareable 

with researchers in the same discipline because everyone is ‘talking the same language’ when 

searching for specific data e.g. plants, animals, medical conditions, places etc 

1. Start by browsing Controlling your Language: a Directory of Metadata Vocabularies 

from JISC in the UK. Make sure you scroll down to 5. Conclusion - it’s worth a read. 

Advanced activity: 

Have a browse around the stunning level of data description and data contained in the Atlas 

of Living Australia. 

Other examples: * Geosciences Australia - http://ldweb.ga.gov.au/def/voc/ga/ * National 

Environmental Information Infrastructure - http://www.neii.gov.au/vocabulary/vocabulary-

providers * Australian Governments' Interactive Functions Thesaurus (AGIFT) - 

http://www.naa.gov.au/information-management/managing-information-and-

records/describing/AGIFT/index.aspx (of interest to Australian Government Linked Open 

Data working group) 

http://www.ala.org.au/
http://www.ala.org.au/
http://www.linked.data.gov.au/


 78 

Data Dictionaries Standardised, accepted terms and protocols used for data collection 

• Australian Institute of Health and Welfare - 

http://meteor.aihw.gov.au/content/index.phtml/itemId/274816 

• Australian Business Register - https://abr.gov.au/For-Government-agencies/Accessing-

ABR-data/ABR-Data-Dictionary/ 

• Health.VIC - https://www2.health.vic.gov.au/about/reporting-planning-data/data-

dictionaries 

• South Australian electronic forms data dictionary - 

https://www.sa.gov.au/editors/electronic-forms-platform/data-dictionary 

• Growing up in Australia Data Dictionary - https://growingupinaustralia.gov.au/data-

and-documentation/data-dictionary 

• Department of Social Services Settlement Database Data Dictionary - 

https://www.dss.gov.au/our-responsibilities/settlement-services/programs-

policy/settlement-services/settlement-reporting-facility/help-for-settlement-

reports/data-dictionary 

Thing 10 

Data impact: 

Data reuse - It is hard to check/track when you don’t have persistent identifiers and there’s 

not much of a data citation culture. 

Web stats Selected data.gov.au web analytics - https://search.data.gov.au/dataset/ds-dga-

9fa9bfda-96b3-4214-8a09-497af105524b/details?q=data.gov.au 

Some old uses of open data: https://data.gov.au/showcase 

Use in GovHack(AU) - https://twitter.com/govhackau?lang=en 

Tracking identifiers - data citation 

Beginner activity: 

Looking at the broader impact of how the data has been used and the benefits it has brought 

to society, industry, economy, etc. is a richer source of impact evidence than just looking at 

citations. 

https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/data-

engagement-and-impact 


 79 

Postscript: Other topics to consider: 

• Data People - data technologists, data librarians, data trainers, data leaders, data 

scientists 

• Data Governance - policy, procedure, planning, improving systems, request funding, 

build business cases for change 

• Data Training - when: induction, checkups, when problems occur; what? Store, 

describe, how and why do data. Advanced topics eg sensitive data, spatial data, 

vocabularies, provenance. 

See for example slide 54 in this Data Readiness slideshow as well as the 24th edition of Share 

(cover shown below). 

https://www.slideshare.net/RichardFerrers/the-national-eresearch-and-data-management-landscape-cdu-data-readiness-training-nov-18
https://www.ands.org.au/news-and-events/share-newsletter/share-24


 80 

 
People in Data 


 81 

References: 
• Government data links 

• Public Records Office Victoria 

Appendix: 

List of Australian state/territory government open data policies: 

Australian Federal Government: Refer policy at Dept of Prime Minister and Cabinet. See also 

National Data Commissioner, ”responsible for implementing a simpler data sharing and 

release framework”. 

Victoria Data Access Policy 

“The Victorian Government recognises the benefits from and encourages the availability of 

Victorian government data for the public good. The DataVic Access Policy has been 

developed to support this recognition.” 

New South Wales Policy (NSW) 

“The objectives of this policy are to assist NSW Government agencies to: release data for use 

by the community, research, business and industry accelerate the use of data to derive new 

insights for better public services embed open data into business-as-usual...” 

Queensland Policy 

Tasmania Policy 

South Australia Policy 

Western Australia Policy 

Australian Capital Territory Policy 

Northern Territory Policy (Darwin) 

 
https://toolkit.data.gov.au/index.php/Main_Page
https://www.prov.vic.gov.au/about-us/partnerships-and-collaborations/open-data
https://www.data.vic.gov.au/policy-and-standards-0
https://www.finance.nsw.gov.au/ict/resources/nsw-government-open-data-policy
https://www.oic.qld.gov.au/publications/policies/open-data-strategy
http://www.egovernment.tas.gov.au/stats_matter/open_data/tasmanian_government_open_data_policy
https://digital.sa.gov.au/resources/topic/open-data/open-data-declaration
https://data.wa.gov.au/open-data-policy
http://www.cmd.act.gov.au/__data/assets/pdf_file/0011/859430/2016-Proactive-Release-of-Data-Open-Data-Policy.pdf
https://www.darwin.nt.gov.au/sites/default/files/publications/attachments/policy_no_086_-_open_data.pdf


 82 

TOP 10 FAIR DATA & SOFTWARE THINGS: 

Archaeology 
 

Sprinters: 

Deidre Whitmore, Tim Dennis (UCLA) 

Description: 

This guide brings concepts surrounding FAIR data principles and the 23 (research data) 

Things program to the archaeological research domain with the aim of fostering better data 

practices and stewardship throughout the discipline. 

Audience: 

Researchers, scholars, employees, students, volunteers -- anyone working with or around 

data collected for archaeological research and management. 

How to use this guide? 

You don’t have to do all of the Things, and in fact, you may not be able to do every Thing. 

However, familiarize yourself with each Thing and implement those which suit your work 

and interests. Try to schedule time to learn more about a Thing regularly and work through 

how you could integrate it into your own research practices. 

Why this guide? 

Archaeological data is costly to collect, difficult or impossible to re-collect, and frequently 

lacks the context or documentation to reuse. Because of this, the domain has not yet 

coalesced around standards, though guidelines and data services are gaining traction. This 

guide helps introduce these services and calls out resources that can facilitate the adoption of 

leading practices. 

Data in archaeology: 

Archaeologists collect and work with a wide range of data types: textual, visual (raster, 

vector), tabular (spreadsheets, databases), spatial, audio, 3D, etc. This makes the creation and 

adoption of standards surrounding data management challenging but also even more 

necessary as these varied types frequently need to be analyzed together and shared among 

collaborators. 

https://librarycarpentry.org/Top-10-FAIR
https://github.com/deidrewhitmore
https://github.com/jt14den


 83 

After working through the 10 Things below you’ll know how to: 
• plan and prepare for data collections so that the data that are collected are FAIR 

• document collection processing analyses to support FAIR data 

• draft and refine a data model 

• find training or data specialists that can assist you in your work 

• identify the multiple roles in the interdisciplinary project 

• plan for a field season that integrates best practices for data management 

• cite data, publish your data so that it can be cited, and why it is important to do so 

• write a good data management plan 

• identify the major data repositories in Archaeology 

• reference the Guides to Good Practice and when to do so (at the start of a project and 

prior to collecting data!) 

• evaluate tools that exist and can be used for humanities data 

Things 

Thing 1: Understanding the lifecycle of research data 

Getting started 
* Read Planning for the Creation of Digital Data in the Digital Antiquity Guides to Good 

Practice. * Consider the types of data collected and used within your own work. How many 

file formats do you work with regularly? How many files have become inaccessible to you 

over the years? To your colleagues or collaborators? 

Learn more 
* Watch the short film on the lifecycle of research data at 

https://www.ukdataservice.ac.uk/manage-data/lifecycle. * Map out the lifecycle of data on 

your most recent project. What processes and workflows have gotten you to the stage you 

are at currently? What can you do to facilitate the ongoing use and reuse of your data? 

Challenge me 
* Read Project Documentation and Project Metadata in the Digital Antiquity Guides to Good 

Practice. * Draft documentation for your most recent project or a forthcoming project. 

Include information about the background, methodology employed or to be employed, a 

narrative on the site and its context (historically, archaeologically, culturally, etc.). This 

documentation will not only facilitate the eventual dissemination of your data but also any 

proposals or publications about the work itself. * Review the metadata for this project, 

document in a single location what metadata you currently record or plan to record and 

compare it to the metadata tables at 

http://guides.archaeologydataservice.ac.uk/g2gp/CreateData_1-0
https://www.ukdataservice.ac.uk/manage-data/lifecycle
http://guides.archaeologydataservice.ac.uk/g2gp/CreateData_1-1
http://guides.archaeologydataservice.ac.uk/g2gp/CreateData_1-2


 84 

http://guides.archaeologydataservice.ac.uk/g2gp/CreateData_1-2. Are you missing any 

Project Metadata? File-Level Metadata (general and technical)? How can you fill in any gaps? 

Thing 2: Preservation 

Getting started 
* Browse the websites for archaeological data repositories and preservation services 

(Archaeology Data Service, tDAR, Open Context). * Identify which service(s) contain data of 

interest to your work. Get familiar searching the services. * Read Why Deposit Data and 

consider what is significant about your data, what requirements you need to meet, and 

which reasons resonate with your work and beliefs. 

Learn more 
* Dig into the deposit instructions and criteria for each repository and service and identify 

which is the best fit for your own data. * Contact the service and discuss your project and 

data with them. Document their recommendations and determine how you can update your 

current workflow to support deposit. 

Challenge me 
* Select a dataset you can deposit and go through the process of depositing in a repository. 

Thing 3: Training and community 

Getting started 
* Review online resources and training materials for archaeological data management such as 

DataTrain's 'Open Access Post-Graduate Teaching Materials in Managing Research Data in 

Archaeology' at http://archaeologydataservice.ac.uk/learning/DataTrain.xhtml. 

Learn more 
* Attend a workshop at an upcoming Archaeology conference that focuses on data 

management or a session on the topic. 

Challenge me 
* Attend a conference or program on data and scholarly communication such as FORCE11's 

Scholarly Communication Institute and/or ASIST's Annual Meeting. * If you are in a position 

to do so, incorporate archaeological data management and preservation lessons into courses 

you teach. Consider inviting a data librarian or information specialist that is familiar with 

archaeological data to be a guest speaker. 

Thing 4: Data Management Plan (DMP) tools 

Getting started 
* Review the guidelines for DMPs from funding agencies you are considering or have applied 

http://guides.archaeologydataservice.ac.uk/g2gp/CreateData_1-2%5D
http://archaeologydataservice.ac.uk/
https://www.tdar.org/about/
https://opencontext.org/
http://archaeologydataservice.ac.uk/deposit/Why.xhtml
http://archaeologydataservice.ac.uk/learning/DataTrain.xhtml
https://www.force11.org/fsci
https://www.force11.org/fsci
https://www.asist.org/events/annual-meeting/


 85 

to in the past: NEH, NSF, AIA, etc. See SPARC's Browse Data Sharing Requirements by 

Federal Agency. 

Learn more 
* Check if your institution is participating in the DMP Tool (meaning they have customized 

the tool to point to institutional resources and services) at https://dmptool.org/public_orgs. * 

Read through publicly available DMPs at https://dmptool.org/public_plans and consider 

what makes them strong/weak. Take notes on what aspects are important to include when 

writing your own. 

Challenge me 
* Use a DMP Tool to create a DMP for a project you are currently working on or planning to 

start. * Ask a data librarian or specialist at your institution to review your DMP. 

Thing 5: Describing data 

Getting started 
* Learn more about metadata schema, controlled vocabularies and why describing data is a 

good practice. Read What are Metadata Standards from the Digital Curation Center at 
http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-

standards and Preparing Datasets - Metadata from ADS at 
http://archaeologydataservice.ac.uk/advice/PreparingDatasets.xhtml#Metadata0. * Consider 

your current metadata practices - do they follow any schema or incorporate any 

vocabularies? Are your metadata fields described and documented explicitly? 

Learn more 
* Review some of the vocabularies and thesauri related to archaeological data including Getty 

Vocabularies at http://www.getty.edu/research/tools/vocabularies/index.html and PeriodO at 

http://perio.do/en/. * Consider whether these vocabularies could be incorporated into your 

data practices and workflow. 

Challenge me 
* Create a data dictionary (metadata field, type, definition, controlled vocabulary status) for a 

current or future project based on the metadata recommendations in the Guides to Good 

Practice. * Do this for each type of data you plan to or have collected that has an associated 

Guide (i.e. Raster images, Geophysics, GIS). 

Thing 6: Cleaning, processing, and documentation 

Getting started 
* Learn about processing and documentation in ‘Data Selection: Preservation Intervention 

Points’ at http://guides.archaeologydataservice.ac.uk/g2gp/ArchivalStrat_1-3. * Consider your 

own workflow and the different stages at which your data is transformed. Write down the 

http://datasharing.sparcopen.org/data
http://datasharing.sparcopen.org/data
https://dmptool.org/public_orgs
https://dmptool.org/public_plans
http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards
http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards
http://archaeologydataservice.ac.uk/advice/PreparingDatasets.xhtml#Metadata0
http://www.getty.edu/research/tools/vocabularies/index.html
http://perio.do/en/
http://guides.archaeologydataservice.ac.uk/g2gp/ArchivalStrat_1-3


 86 

equipment and instruments you use to collect data and the process for obtaining the data 

from those instruments (i.e. calibrating, exporting) 

Learn more 
* Investigate tools that facilitate data cleaning and documentation such as Open Refine at 

http://guides.archaeologydataservice.ac.uk/g2gp/ArchivalStrat_1-3. * Attend a workshop or 

go through a tutorial to learn how to use the tool and its features including exporting out the 

record of the cleaning, etc. 

Challenge me 
* Choose a recent dataset you've collected and go through the processing and cleaning 

workflow. Be sure to document every step and follow conventions for file names, file 

formats, and backup creation. 

Thing 7: Sharing 

Getting started 
* Learn more about why sharing data matters in archaeology. Explore publications on 

archaeological data, reuse, and publishing including Openness and archaeology's information 
ecosystem at https://escholarship.org/uc/item/9tq378jg and Other People's Data: A 
Demonstration of the Imperative of Publishing Primary Data at 
https://escholarship.org/uc/item/1nt1v9n2 * Consider times you haven't been able to access 

data associated with your research. How did you address this issue? * Consider times you 

have tried to use collaborators' or colleagues' data in your own research. What steps did you 

have to take to make sense of the data, to incorporate it into your own dataset, or to analyze 

it? What might have made this process easier? 

Learn more 
* Learn more about sensitive data and what you can do to protect while still making it 

accessible from resources such as ANDS (https://www.ands.org.au/working-with-

data/sensitive-data/sharing-sensitive-data) and ADS 

(http://archaeologydataservice.ac.uk/advice/sensitiveDataPolicy.xhtml) * Consider whether 

there are any ethical or legal restrictions around data in your own work. Discuss these 

considerations with the appropriate representatives and determine what the best plan for 

sharing data is for all relevant parties. 

Challenge me 
* Learn more about the differences between publishing and sharing data then either: * 

Prepare a dataset of your own for sharing with a colleague or collaborator and ask them to 

report back on any issues they faced understanding the data, accessing files or information, 

and what you could have done to simplify their use of the dataset. * Or publish a dataset of 

your own. This can be done either in association with an article or book, as a data paper with 

a journal that specializes in data publication, or through a data publishing service. Consider 

http://guides.archaeologydataservice.ac.uk/g2gp/ArchivalStrat_1-3
https://escholarship.org/uc/item/9tq378jg
https://escholarship.org/uc/item/1nt1v9n2
https://www.ands.org.au/working-with-data/sensitive-data/sharing-sensitive-data
https://www.ands.org.au/working-with-data/sensitive-data/sharing-sensitive-data
http://archaeologydataservice.ac.uk/advice/sensitiveDataPolicy.xhtml


 87 

the challenges you faced as your prepared the dataset and what you can do to simply the 

process next time, then incorporate these practices into your workflow. 

Thing 8: Citation 

Getting started 
* Data citation continues the tradition of acknowledging other people’s work and ideas. 

Along with books, journals and other scholarly works, it is now possible to formally cite 

research datasets and even the software that was used to create or analyze the data. Consider 

if there are times your data was reused by someone else and whether you received scholarly 

credit. * Read the Force11 Joint Declaration of Data Citation Principles at 

https://www.force11.org/datacitationprinciples. * Watch this video on persistent identifiers 

and data citation at https://www.youtube.com/watch?v=PgqtiY7oZ6k. * Search data 

repositories and services such as ADS, tDAR, and Open Context and see how their 

recommended citations are formatted. 

Learn more 
* Consider how many times you've read research papers and felt the data was either 

insufficient or inaccessible and how this impacted your interpretation. * Have a discussion 

with your colleagues about their perspectives on publishing data so that it is findable, in 

formats that are accessible, and with enough descriptive metadata and documentation to be 

reusable. Have any of them ever cited a dataset? Why or why not? What would be needed 

for this to become a common practice in archaeology? 

Challenge me 
* Include citations to datasets, not just scholarly articles and books, relevant to your work in 

your next publication. * Consider whether persistent identifiers (PIDs) should be routinely 

applied to all research outputs. Remember that PIDs carry an expectation of persistence 

(maintenance costs, etc.) but can be used to collect metrics as well as link articles and data 

(evidence of impact). 

Thing 9: Licensing 

Getting started 
* Research licensing research data in your country and what set of licenses is used most 

commonly. * Discuss with colleagues if they have licensed their data and what their 

experience has been. 

Learn more 
* Read through the licensing agreements and policies for data services and repositories, 

starting with ADS, tDAR, and Open Context. Consider whether these policies align with 

your datasets and obligations. 

https://www.force11.org/datacitationprinciples
https://www.youtube.com/watch?v=PgqtiY7oZ6k
http://archaeologydataservice.ac.uk/advice/termsOfUseAndAccess.xhtml
https://www.tdar.org/about/policies/contributors-agreement/
https://opencontext.org/about/publishing


 88 

Challenge me 
* Determine which license is appropriate for your data and if possible, release one of your 

own datasets by depositing into an archive or repository. Consider consulting with a data 

service representative or data librarian about your selection. 

Thing 10: FAIR in archaeology 

Getting started 
* Read through the FAIR data principles at https://www.go-fair.org/fair-principles/. * 

Consider what these principles mean in practice and how each of the Things you are 

implementing support FAIR archaeological data. What would it mean if every archaeologist 

followed these principles? 

Learn more 
* Watch the webinar Enabling FAIR Data at https://www.dataone.org/webinars/enabling-
fair-data or Are we FAIR yet? at https://rd-alliance.org/webinar-are-we-fair-yet. 

Challenge me 
* Assess the FAIRness of one of your recent datasets using the FAIR self-assessment tool from 
ARDC. What did you learn about your data? How can you do better? 

 
https://www.go-fair.org/fair-principles/
https://www.dataone.org/webinars/enabling-fair-data
https://www.dataone.org/webinars/enabling-fair-data
https://rd-alliance.org/webinar-are-we-fair-yet
https://www.ands-nectar-rds.org.au/fair-tool
https://www.ands-nectar-rds.org.au/fair-tool

	Top 10 FAIR Data & Software Things
	February 1, 2019
	Sprinters:
	Organisations:
	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Table of Contents
	About, p. 3
	Oceanography, p. 4
	Research Software, p. 15
	Research Libraries , p. 20
	Research Data Management Support, p. 25
	International Relations, p. 30
	Humanities: Historical Research, p. 34
	Geoscience, p. 42
	Biomedical Data Producers, Stewards, and Funders, p. 54
	Biodiversity, p. 59
	Australian Government Data/Collections, p. 67
	Archaeology, p. 79
	TOP 10 FAIR DATA & SOFTWARE THINGS:
	About
	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Oceanography
	Sprinters:
	Table of contents
	Findability:
	Accessibility:
	Interoperability:
	Reusability:

	Description:
	Audience:
	Goal:

	Things
	Thing 1: Data repositories
	Activity 1:
	Activity 2:

	Thing 2: Metadata
	Activity 1:
	Activity 2:
	Discussion:

	Thing 3: Permanent identifiers
	Activity 1:
	Activity 2:
	Activity 3:
	Discussion:

	Thing 4: Citations
	Activity 1:
	Discussion
	Tip: Resources that can help make your data more open and accessible or to protect your data

	Thing 5: Data formats
	Discussion 1:
	Discussion 2:

	Thing 6: Data organization and management
	Activity 1:
	Considerations for basic data organization and management
	Group Discussion 1:
	Group Discussion 2:
	Activity 2:
	Identifying vulnerabilities
	Discussion 1:
	Discussion 2:
	The Data Management Plan (DMP)
	What is a DMP?
	Activity 1:
	Activity 2:
	Discussion:

	Thing 7: Re-usable data
	Reusable data
	Process/derived data products
	Discussion 1:
	Discussion 2:

	Thing 8: Tools of the trade
	Why open source tools?
	Things to consider when using open source tools
	Benefits:
	Issues:
	Benefits:
	Issues:
	Discussion:

	Thing 9: Reproducibility
	Can you or others reproduce your work?
	Best practices:
	Discussion:

	Thing 10: APIs and applications (apps)
	Activity:
	Discussion:


	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Research Software
	Sprinters
	Description:
	Audience:
	Goals:
	What is FAIR for software


	Things
	Findability
	Thing 1: Create a description of your software
	Thing 2: Register your software in a software registry
	Thing 3: Get and use a unique and persistent identifier for your software

	Accessibility
	Thing 4: Make sure that people can download your software

	Interoperability
	Thing 5: Explain the functionality of your software
	Thing 6: Use standard (community agreed) formats for inputs and outputs

	Reusability
	Thing 7: Document your software
	Thing 8: Give your software a license
	Thing 9: State how to cite your software
	Thing 10: Follow best practices for software development


	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Research Libraries
	Sprinters:
	Description:
	Audience:
	Goals:

	Things
	Thing 1: Why should librarians care about FAIR?
	Thing 2: How FAIR are your data?
	Thing 3: Do you teach FAIR to your researchers?
	Thing 4: Is FAIR built into library practice and policy?
	Thing 5: Are your library staff trained in FAIR?
	Thing 6: Are digital libraries FAIR?
	Thing 7: Does your library support FAIR metadata?
	Thing 8: Does your library support FAIR identifiers?
	Thing 9: Does your library support FAIR protocols?
	Thing 10: Next steps for your library in supporting FAIR

	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Research Data Management Support
	Sprinters:
	Description:
	Audience:

	Things
	Thing 1: Why bother with FAIR?
	Thing 2: Metadata
	Thing 3: The definition of FAIR metrics
	Thing 4: Searchable resources and repositories
	Thing 5: Persistent identifiers
	Thing 6: Documentation
	Thing 7: Formats and standards
	Thing 8: Controlled vocabulary
	Thing 9: Use a license
	Thing 10: FAIR and privacy

	TOP 10 FAIR DATA & SOFTWARE THINGS:
	International Relations
	Sprinter:
	Description:
	What is FAIR data?
	Audience:
	Goal:

	Things
	Thing 1: Getting started
	Thing 2: Discovering data
	Thing 3: Data identifiers
	Thing 4: Data citation
	Thing 5: Data licensing
	Thing 6: Sensitive data
	Thing 7: Data publishing
	Thing 8: Funder requirements
	Thing 9: Data sharing
	Thing 10: Learn more

	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Humanities: Historical Research
	Sprinters:
	Description:

	Things
	Findable
	Thing 1: Data repositories
	Thing 2: Metadata
	Thing 3: Persistent identifiers
	Accessible
	Thing 4: Open data
	Interoperable
	Thing 5: Data structuring and organisation
	Thing 6: Controlled vocabularies and ontologies
	Thing 7: FAIR data modelling
	Reusable
	Thing 8: Licensing
	Thing 9: Data citation
	Context
	Thing 10: Policies

	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Geoscience
	Sprinters:
	Audience:

	Things
	Findable
	Accessible
	Interoperable
	Reusable
	Thing 1: Data sharing and discovery
	Activity 1: Data discovery
	Activity 2: Finding data repositories

	Thing 2: Long-lived data: curation & preservation
	Activity 1: Preserving born digital objects
	Activity 2: Readme files

	Thing 3: Data citation for access & attribution
	Activity 1: Citing research data
	Activity 2: Citing software

	Thing 4: DOIs and citation metrics
	Activity 1: DOIs
	Activity 2: IGSNs
	Activity 3: Altmetrics

	Thing 5: Licensing data for reuse
	Activity 1: Why license research data?
	Activity 2: Data licences: unlock data for innovation
	Activity 3: Data licensing in practice

	Thing 6: Vocabularies for data description
	Activity 1: What is controlled vocabulary?
	Activity 2: Controlled vocabularies in action
	Activity 3: Geoscience vocabularies

	Thing 7: Identifiers and linked data
	Activity 1: Check your ORCID
	Activity 2: Get more from your ORCID
	Activity 3: Identifiers and linked data

	Thing 8: What are publishers & funders saying about data?
	Activity 1: Research data and scholarly publishing
	Activity 2: Research funders and data sharing

	Thing 9: Exploring APIs and applications
	Activity 1: Try an app
	Activity 2: APIs

	Thing 10: Spatial data
	Activity 1: Spatial data: Maps and more
	Activity 2: Spatial data concepts
	Activity 3: Using and visualising spatial data

	References:

	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Biomedical Data Producers, Stewards, and Funders
	Sprinters:
	Audience:

	Things
	Thing 1: Metadata creation and curation
	Beginner activity:
	Intermediate activity:

	Thing 2: Use of standard data models
	Thing 3: Exploring unique, persistent identifiers
	Beginner activity:
	Intermediate Activity:

	Thing 4: Versioning and data "retirement"
	Beginner activity:
	Intermediate activity:

	Thing 5: Linking research objects
	Beginner activity:
	Intermediate activity:

	Thing 6: Human and machine readability
	Thing 7: Maintain/preserve entire research environment (e.g. software)
	Thing 8: Indexing repositories to enable findability
	Thing 9: Broad consent
	Thing 10: Application of metrics to evaluate the FAIRness of (data) repositories
	Beginner activity:
	Intermediate activity:


	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Biodiversity
	Sprinters:
	Audience:

	Things
	Findability
	Thing 1: Identifiers
	Thing 2: Citations
	Thing 3: Wikidata
	Thing 4: Registry of Research Data Repositories (re3data)
	Accessibility
	Thing 5: Bioschemas
	Thing 6: Licenses
	Thing 7: Availability via torrents

	Interoperability:
	Thing 8: ELIXIR platforms
	Thing 9: Research data management
	Thing 10: Machine-readability
	Exercise - Crossref:
	Exercise - DataCite:

	Reusability
	Thing 11: Digitalization
	Thing 12: Containers
	Exercise:

	Thing 13: Blockchain for Life Science
	Supplementary information:


	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Australian Government Data/Collections
	Sprinters:
	FAIR data
	Description:
	Audience:
	Goal:
	NB: Nomenclature and data:
	Table of contents

	Things
	Thing 1: Why is data important?
	Beginner activity:
	Intermediate activity:
	Advanced activity:

	Thing 2: Open data vs FAIR data
	Beginner activity:
	Advanced activity:
	Thing 3: Data discovery
	International government data portals:

	Thing 4: Describing your data or collection
	Beginner activity:
	Intermediate activity;
	Advanced activity:

	Thing 5: Identifiers
	Beginner activity:

	Thing 6: Licensing
	Thing 7: Dirty data
	Beginner activity:
	Advanced activity:
	Thing 8: Working with sensitive data

	Thing 9: Vocabularies - Assisting with interoperability
	Beginner activity:
	Advanced activity:

	Thing 10
	Data impact:
	Beginner activity:
	Postscript: Other topics to consider:

	References:
	Appendix:
	List of Australian state/territory government open data policies:


	TOP 10 FAIR DATA & SOFTWARE THINGS:
	Archaeology
	Sprinters:
	Description:
	Audience:
	How to use this guide?
	Why this guide?
	Data in archaeology:
	After working through the 10 Things below you’ll know how to:

	Things
	Thing 1: Understanding the lifecycle of research data
	Thing 2: Preservation
	Thing 3: Training and community
	Thing 4: Data Management Plan (DMP) tools
	Thing 5: Describing data
	Thing 6: Cleaning, processing, and documentation
	Thing 7: Sharing
	Thing 8: Citation
	Thing 9: Licensing
	Thing 10: FAIR in archaeology