Archival Connections

One of the questions I’ve been grappling with as part of the Archival Connections research project is simple: Is there a future for the finding aid? I’m inclined to think not, at least not in the form we are used to.

Looking to the future, I recently had the chance to propose something slightly different, and have proposed a potential project for funding via an Amazon Research Grant. While the jury is still out on the proposal (an answer is coming in mid-December), I’d like to share a copy of the proposal, Scaling Machine-Assisted Description of Historical Materials.

The idea I describe there seeks to build on an emergent digital repository and library infrastructure that is being built by the University of Illinois Library. It seeks to integrate natural language processing and named entity recognition elements to index and provide relational browsing pathways alongside file-system access. I’ll have more to say about this at the Society of Indiana Archivists meeting tomorrow.

Later this week, I’ll be introducing the Archival Connections Project at the Society of Indiana Archivists Meeting. During the first year of this project, one focus of my work was evaluating and developing some recommendations for using Social Feed Manager, a tool developed by George Washington University Libraries.

My full report is here, for those interested: https://gwu-libraries.github.io/sfm-ui/resources/SFMReportProm2017.pdf.

Without going into too much detail, here is what I feel like I learned while working on this report, at least as far as it relates to the Archival Connections project:

First and foremost: Data models matter. As I indicated in the report, the SFM’s underlying database and data model are both simple and elegant. Since the application focuses on doing one thing and doing it well, the database directly translates into user interface components that make the application a joy to use. While the project team hired a usability consultant to improve the app, the tweaks made by the team in response to the report simply added polish to an already strong interface. While I won’t be so impolitic as to compare SFM to other archival tools, the application works well, in part, because the various data object and the tables that underly them represent things that exist in the real world, not abstractions or vague concepts that are hard for staff to understand or programmers to translate into an interface.

Second: Archivists should become better API consumers. One of the things that fascinates me most about SFM is the fact that it connects directly to the Twitter API and slurps up all of the metadata supplied by it. Thinking broadly, the archival and information professions are doing a lot to build and use our own API’s or data providers, but less to interact with those supplied by the data companies that now order our lives. For example, do we have an API that line archivists (as opposed to technical staff) can connect to (a) Google Drive, Box.com, Outlook 365, or Facebook, (b), harvest records from those systems, and (c) prep them for deposit in a digital repository? Not that I am aware of, but we should. Without them, we can’t capture records and preservation metadata at or near the point that records created (h/t David Bearman).

Third: The metadata that APIs supply is a two-edged sword. Once you dig into their JSON files, you quickly see that Twitter supplies a lot of what the OAIS reference model calls preservation metadata: dates and times tweets were published, times the tool captured it, etc. As a baseline, such data will help people make future claims about the authenticity of these records or mine them as data. But given the relative lack of descriptive metadata and the fact that bots and other non-human agents control so many twitter accounts (not to mention the fact that many users’ handles tell you little to nothing about their real identity), this metadata in itself is not sufficient to say something is authentic or not authentic or to wring much value from the dataset. That requires (wait for it . . . ) a person interpreting the records using all of the intelligence they can muster.

Finally: Aggregations matter now more than ever. I was a bit taken aback a few months ago when the committee charged with revising DACS made no mention of provenance, original order, arranging files or levels of description in their draft principles. While their work had much to recommend it, the lack of any mention like an oversight, and an important one. My work with SFM has convinced me that aggregations and provenance are even more important when working with records harvested from the cloud. Given the free-floating, intertwined nature of records found in social media or other ‘cloud’ platforms, it seems to me that the act of capturing records by an archivist results in an aggregation. For instance, SFM generates a set of tweets, but that set is the result of an archivist’s activity to shape the collection. And this aggregation and the provenance behind it deserve to be described as such, with as much transparency about the archivist’s role as possible. In short, archivists can and must do a good job of arranging and describing materials at a collection or series level, there is no workaround for this core archival function, even–or perhaps especially–when extracting item based metadata and records from the platforms that now rule many people daily work and social lives.

Platform Monopolies and Archives

SIA Workshop Links

Scaling Machine-Assisted Description of Historical Records

Social Feed Manager Takeaways

Arrangement and Description in the Cloud: A Preliminary Analysis