Microsoft Word - Digital Scholarship in Action Research Rawson.docx Katherine Rawson Presentation for “Digital Scholarship in Action: Research” 2016 MLA Convention | Austin, Texas Today, I am going to talk about building humanities data sets for research. Humanities data sets are often made by cobbling or accretion, sometimes both. Building these data sets is often an activity that involves many people. They are - institutionally built, - made by collections of scholars, - made with the help of the public through crowdsourcing. Examples: HathiTrust Worksets - digitizing libraries, HathiTrust, scholars Curating Menus - librarians, librarians, crowds of transcribers, scholars The peopled nature of this work can be a virtue or a complication (often both). So if data sets are made by many people: what are the stakes for humanities researchers? As scholars, we want to: - maintain research value - act ethically around the work of others - attend to the ways that the perspective of those involved shaped the data set I am going to focus on the later two (in ways that I think support the first). Attend to the Work of Others We want to act in ways that are ethically and intellectually responsible. Things like collaborators’ bills of rights can help: NEH: http://mcpress.media-commons.org/offthetracks/part-one-models-for-collaboration-career- paths-acquiring-institutional-support-and-transformation-in-the-field/a- collaboration/collaborators%E2%80%99-bill-of-rights/ UCLA: http://www.cdh.ucla.edu/news-events/a-student-collaborators-bill-of-rights/ But sometimes the work is already done. What about when collaborators are not living or are not named? In this case, we might consider narratives of acknowledgement. For example, for Curating Menus, which relies on the work of anonymous people who contributed to NYPL’s What’s on the Menu project and on librarians and donors who are now deceased, we decided to write the story of these people and to create a data dictionary that includes a place of agents. For each kind of data, we not only describe what it is, but who made it. Finally, if we are using the work of others, we should — as much as possible — share our data. This is not unlike the notion of the public trust that Lisa Rhody discussed yesterday — only the bar is lower. It’s much easier to share data than maintain tools and projects. Attend to the Perspective of Others How do we attend to the ways that the perspectives of the people who made them shape our data sets? Miriam Posner raised this at Keystone DH as she discussed the nature of diversity and contingency of critical humanities discourse in comparison to the data sources we sometimes have to work with — what do we do about the fact that the census (and numerous agents involved in the census) say that gender is a binary? I think this is an important and promising question for digital humanities scholars. Today, I am going to offer thoughts on one approach. In answering these kinds of questions for Curating Menus, Trevor Munoz and I did a few things. First, we researched and wrote about where our data came from. Then we began experimenting with data structures. We have been working with indexing, because this allows us to maintain the frameworks of the other people who made the data (and to explicitly acknowledge their frameworks), while adding our own. Instead of modifying the data that came before us, we are layering on top of it. The goal is to do this within a linked open data framework, so that many people can transparently add to, connect, and manipulate the data, while being able to see the hands of others in it.