The Code4Lib Journal

The Code4Lib Journal
	Editorial
Resuming our publication schedule
	Managing an institutional repository workflow with GitLab and a folder-based deposit system
Institutional Repositories (IR) exist in a variety of configurations and in various states of development across the country. Each organization with an IR has a workflow that can range from explicitly documented and codified sets of software and human workflows, to ad hoc assortments of methods for working with faculty to acquire, process and load items into a repository. The University of North Texas (UNT) Libraries has managed an IR called UNT Scholarly Works for the past decade but has until recently relied on ad hoc workflows. Over the past six months, we have worked to improve our processes in a way that is extensible and flexible while also providing a clear workflow for our staff to process submitted and harvested content. Our approach makes use of GitLab and its associated tools to track and communicate priorities for a multi-user team processing resources. We paired this Web-based management with a folder-based system for moving the deposited resources through a sequential set of processes that are necessary to describe, upload, and preserve the resource. This strategy can be used in a number of different applications and can serve as a set of building blocks that can be configured in different ways. This article will discuss which components of GitLab are used together as tools for tracking deposits from faculty as they move through different steps in the workflow. Likewise, the folder-based workflow queue will be presented and described as implemented at UNT, and examples for how we have used it in different situations will be presented.
	Customizing Alma and Primo for Home & Locker Delivery
Like many Ex Libris libraries in Fall 2020, our library at California State University, Northridge (CSUN) was not physically open to the public during the 2020-2021 academic year, but we wanted to continue to support the research and study needs of our over 38,000 university students and 4,000 faculty and staff. This article will explain our Alma and Primo implementation to allow for home mail delivery of physical items, including policy decisions, workflow changes, customization of request forms through labels and delivery skins, customization of Alma letters, a Python solution to add the “home” address type to patron addresses to make it all work, and will include relevant code samples in Python, XSL, CSS, XML, and JSON. In Spring 2021, we will add the on-site locker delivery option in addition to home delivery, and this article will include new system changes made for that option.
	GaNCH: Using Linked Open Data for Georgia’s Natural, Cultural and Historic Organizations’ Disaster Response
In June 2019, the Atlanta University Center Robert W. Woodruff Library received a LYRASIS Catalyst Fund grant to support the creation of a publicly editable directory of Georgia’s Natural, Cultural and Historical Organizations (NCHs), allowing for quick retrieval of location and contact information for disaster response. By the end of the project, over 1,900 entries for NCH organizations in Georgia were compiled, updated, and uploaded to Wikidata, the linked open data database from the Wikimedia Foundation. These entries included directory contact information and GIS coordinates that appear on a map presented on the GaNCH project website (https://ganch.auctr.edu/), allowing emergency responders to quickly search for NCHs by region and county in the event of a disaster. In this article we discuss the design principles, methods, and challenges encountered in building and implementing this tool, including the impact the tool has had on statewide disaster response after implementation.
	Archive This Moment D.C.: A Case Study of Participatory Collecting During COVID-19
When the COVID-19 pandemic brought life in Washington, D.C. to a standstill in March 2020, staff at DC Public Library began looking for ways to document how this historic event was affecting everyday life. Recognizing the value of first-person accounts for historical research, staff launched Archive This Moment D.C. to preserve the story of daily life in the District during the stay-at-home order. Materials were collected from public Instagram and Twitter posts submitted through the hashtag #archivethismomentdc. In addition to social media, creators also submitted materials using an Airtable webform set up for the project and through email. Over 2,000 digital files were collected. 

This article will discuss the planning, professional collaboration, promotion, selection, access, and lessons learned from the project; as well as the technical setup, collection strategies, and metadata requirements. In particular, this article will include a discussion of the evolving collection scope of the project and the need for clear ethical guidelines surrounding privacy when collecting materials in real-time.
	Advancing ARKs in the Historical Ontology Space
This paper presents the application of Archival Resource Keys (ARKs) for persistent identification and resolution of concepts in historical ontologies. Our use case is the 1910 Library of Congress Subject Headings (LCSH), which we have converted to the Simple Knowledge Organization System (SKOS) format and will use for representing a corpus of historical Encyclopedia Britannica articles. We report on the steps taken to assign ARKs in support of the Nineteenth-Century Knowledge Project, where we are using the HIVE vocabulary tool to automatically assign subject metadata from both the 1910 LCSH and the contemporary LCSH faceted, topical vocabulary to enable the study of the evolution of knowledge.
	Considered Content: a Design System for Equity, Accessibility, and Sustainability
The University of Minnesota Libraries developed and applied a principles-based design system to their Health Sciences Library website. With the design system at its center, the revised site was able to achieve accessible, ethical, inclusive, sustainable, responsible, and universal design. The final site was built with elegantly accessible semantic HTML-focused code on Drupal 8 with highly curated and considered content, meeting and exceeding WCAG 2.1 AA guidance and addressing cognitive and learning considerations through the use of plain language, templated pages for consistent page-level organization, and no hidden content. As a result, the site better supports all users regardless of their abilities, attention level, mental status, reading level, and reliability of their internet connection, all of which are especially critical now as an elevated number of people experience crises, anxieties, and depression.
	Robustifying Links To Combat Reference Rot
Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, provide an overview of existing techniques and their characteristics to address it, and introduce our Robust Links approach, including its web service and underlying API. Robustifying links offers a proactive, uniform, and machine-actionable way to combat reference rot. In addition, we discuss our reasoning and approach aimed at keeping the approach functional for the long term. To showcase our approach, we have robustified all links in this article.
	Machine Learning Based Chat Analysis
The BYU library implemented a Machine Learning-based tool to perform various text analysis tasks on transcripts of chat-based interactions between patrons and librarians. These text analysis tasks included estimating patron satisfaction and classifying queries into various categories such as Research/Reference, Directional, Tech/Troubleshooting, Policy/Procedure, and others. An accuracy of 78% or better was achieved for each category. This paper details the implementation details and explores potential applications for the text analysis tool.
	Always Be Migrating
At the University of California, Los Angeles, the Digital Library Program is in the midst of a large, multi-faceted migration project. This article presents a narrative of migration and a new mindset for technology and library staff in their ever-changing infrastructure and systems. This article posits that migration from system to system should be integrated into normal activities so that it is not a singular event or major project, but so that it is a process built into the core activities of a unit.
	Editorial: For Pandemic Times Such as This
A pandemic changes the world and changes libraries.
	Open Source Tools for Scaling Data Curation at QDR
This paper describes the development of services and tools for scaling data curation services at the Qualitative Data Repository (QDR). Through a set of open-source tools, semi-automated workflows, and extensions to the Dataverse platform, our team has built services for curators to efficiently and effectively publish collections of qualitatively derived data. The contributions we seek to make in this paper are as follows: 

1. We describe ‘human-in-the-loop’ curation and the tools that facilitate this model at QDR;

2. We provide an in-depth discussion of the design and implementation of these tools, including applications specific to the Dataverse software repository, as well as standalone archiving tools written in R; and

3. We highlight the role of providing a service layer for data discovery and accessibility of qualitative data.

Keywords: Data curation; open-source; qualitative data
	From Text to Map: Combing Named Entity Recognition and Geographic Information Systems
This tutorial shows readers how to leverage the power of named entity recognition (NER) and geographic information systems (GIS) to extract place names from text, geocode them, and create a public-facing map. This process is highly useful across disciplines. For example, it can be used to generate maps from historical primary sources, works of literature set in the real world, and corpora of academic scholarship. In order to lead the reader through this process, the authors work with a 500 article sample of the COVID-19 Open Research Dataset Challenge (CORD-19) dataset. As of the date of writing, CORD-19 includes 45,000 full-text articles with metadata. Using this sample, the authors demonstrate how to extract locations from the full-text with the spaCy library in Python, highlight methods to clean up the extracted data with the Pandas library, and finally teach the reader how to create an interactive map of the places using ArcGIS Online. The processes and code are described in a manner that is reusable for any corpus of text
	Using Integrated Library Systems and Open Data to Analyze Library Cardholders
The Harrison Public Library in Westchester County, New York operates two library buildings in Harrison: The Richard E. Halperin Memorial Library Building (the library’s main building, located in downtown Harrison) and a West Harrison branch location. As part of its latest three-year strategic plan, the library sought to use existing resources to improve understanding of its cardholders at both locations. 

To do so, we needed to link the circulation data in our integrated library system, Evergreen, to geographic data and demographic data. We decided to build a geodemographic heatmap that incorporated all three aforementioned types of data. Using Evergreen, American Community Survey (ACS) data, and Google Maps, we plotted each cardholder’s residence on a map, added census boundaries (called tracts) and our town’s borders to the map, and produced summary statistics for each tract detailing its demographics and the library card usage of its residents. In this article, we describe how we acquired the necessary data and built the heatmap. We also touch on how we safeguarded the data while building the heatmap, which is an internal tool available only to select authorized staff members. Finally, we discuss what we learned from the heatmap and how libraries can use open data to benefit their communities.
	Update OCLC Holdings Without Paying Additional Fees: A Patchwork Approach
Accurate OCLC holdings are vital for interlibrary loan transactions. However, over time weeding projects, replacing lost or damaged materials, and human error can leave a library with a catalog that is no longer reflected through OCLC. While OCLC offers reclamation services to bring poorly maintained collections up-to-date, the associated fee may be cost prohibitive for libraries with limited budgets. This article will describe the process used at Austin Peay State University to identify, isolate, and update holdings using OCLC Collection Manager queries, MarcEdit, Excel, and Python. Some portions of this process are completed using basic coding; however, troubleshooting techniques will be included for those with limited previous experience.
	Data reuse in linked data projects: a comparison of Alma and Share-VDE BIBFRAME networks
This article presents an analysis of the enrichment, transformation, and clustering used by vendors Casalini Libri/@CULT and Ex Libris for their respective conversions of MARC data to BIBFRAME. The analysis considers the source MARC21 data used by Alma then the enrichment and transformation of MARC21 data from Share-VDE partner libraries. The clustering of linked data into a BIBFRAME network is a key outcome of data reuse in linked data projects and fundamental to the improvement of the discovery of library collections on the web and within search systems.
	CollectionBuilder-CONTENTdm: Developing a Static Web ‘Skin’ for CONTENTdm-based Digital Collections
Unsatisfied with customization options for CONTENTdm, librarians at University of Idaho Library have been using a modern static web approach to creating digital exhibit websites that sit in front of the digital repository. This "skin" is designed to provide users with new pathways to discover and explore collection content and context. This article describes the concepts behind the approach and how it has developed into an open source, data-driven tool called CollectionBuilider-CONTENTdm. The authors outline the design decisions and principles guiding the development of CollectionBuilder, and detail how a version is used at the University of Idaho Library to collaboratively build digital collections and digital scholarship projects.
	Automated Collections Workflows in GOBI: Using Python to Scrape for Purchase Options
The NC State University Libraries has developed a tool for querying GOBI, our print and ebook ordering vendor platform, to automate monthly collections reports. These reports detail purchase options for missing or long-overdue items, as well as popular items with multiple holds. GOBI does not offer an API, forcing staff to conduct manual title-by-title searches that previously took up to 15 hours per month. To make this process more efficient, we wrote a Python script that automates title searches and the extraction of key data (price, date of publication, binding type) from GOBI. This tool can gather data for hundreds of titles in half an hour or less, freeing up time for other projects.

This article will describe the process of creating this script, as well as how it finds and selects data in GOBI. It will also discuss how these results are paired with NC State’s holdings data to create reports for collection managers. Lastly, the article will examine obstacles that were experienced in the creation of the tool and offer recommendations for other organizations seeking to automate collections workflows.
	Testing remote access to e-resource with CodeceptJS
At the Badische Landesbibliothek Karlsruhe (BLB) we offer a variety of e-resources with different access requirements. On the one hand, there is free access to open access material, no matter where you are. On the other hand, there are e-resources that you can only access when you are in the rooms of the BLB. We also offer e-resources that you can access from anywhere, but you must have a library account for authentication to gain access. To test the functionality of these access methods, we have created a project to automatically test the entire process from searching our catalogue, selecting a hit, logging in to the provider's site and checking the results. For this we use the End 2 End Testing Framework CodeceptJS.
	Editorial
An abundance of information sharing.
	Leveraging Google Drive for Digital Library Object Storage
This article will describe a process at the University of Kentucky Libraries for utilizing an unlimited Google Drive for Education account for digital library object storage.  For a number of recent digital library projects, we have used Google Drive for both archival file storage and web derivative file storage.  As a part of the process, a Google Drive API script is deployed in order to automate the gathering of of Google Drive object identifiers.  Also, a custom Omeka plugin was developed to allow for referencing web deliverable files within a web publishing platform via object linking and embedding.

For a number of new digital library projects, we have moved toward a small VM approach to digital library management where the VM serves as a web front end but not a storage node.  This has necessitated alternative approaches to storing web addressable digital library objects.  One option is the use of Google Drive for storing digital objects.  An overview of our approach is included in this article as well as links to open source code we adopted and more open source code we produced.
	Building a Library Search Infrastructure with Elasticsearch
This article discusses our implementation of an Elastic cluster to address our search, search administration and indexing needs, how it integrates in our technology infrastructure, and finally takes a close look at the way that we built a reusable, dynamic search engine that powers our digital repository search. We cover the lessons learned with our early implementations and how to address them to lay the groundwork for a scalable, networked search environment that can also be applied to alternative search engines such as Solr.
	How to Use an API Management platform to Easily Build Local Web Apps
Setting up an API management platform like DreamFactory can open up a lot of possibilities for potential projects within your library. With an automatically generated restful API, the University Libraries at Virginia Tech have been able to create applications for gathering walk-in data and reference questions, public polling apps, feedback systems for service points, data dashboards and more. This article will describe what an API management platform is, why you might want one, and the types of potential projects that can quickly be put together by your local web developer.
	Git and GitLab in Library Website Change Management Workflows
Library websites can benefit from a separate development environment and a robust change management workflow, especially when there are multiple authors.  This article details how the Oakland University William Beaumont School of Medicine Library use Git and GitLab in a change management workflow with a serverless development environment for their website development team.  Git tracks changes to the code, allowing changes to be made and tested in a separate branch before being merged back into the website. GitLab adds features such as issue tracking and discussion threads to Git to facilitate communication and planning. Adoption of these tools and this workflow have dramatically improved the organization and efficiency of the OUWB Medical Library web development team, and it is the hope of the authors that by sharing our experience with them others may benefit as well.
	Experimenting with a Machine Generated Annotations Pipeline
The UCLA Library reorganized its software developers into focused subteams with one, the Labs Team, dedicated to conducting experiments. In this article we describe our first attempt at conducting a software development experiment, in which we attempted to improve our digital library’s search results with metadata from cloud-based image tagging services. We explore the findings and discuss the lessons learned from our first attempt at running an experiment.
	Leveraging the RBMS/BSC Latin Place Names File with Python
To answer the relatively straight-forward question “Which rare materials in my library catalog were published in Venice?” requires an advanced knowledge of geography, language, orthography, alphabet graphical changes, cataloging standards, transcription practices, and data analysis. The imprint statements of rare materials transcribe place names more faithfully as it appears on the piece itself, such as Venetus, or Venetiae, rather than a recognizable and contemporary form of place name, such as Venice, Italy. Rare materials catalogers recognize this geographic discoverability and selection issue and solve it with a standardized solution. To add consistency and normalization to imprint locations, rare materials catalogers utilize hierarchical place names to create a special imprint index. However, this normalized and contemporary form of place name is often missing from legacy bibliographic records. This article demonstrates using a traditional rare materials cataloging aid, the RBMS/BSC Latin Place Names File, with programming tools, Jupyter Notebook and Python, to retrospectively populate a special imprint index for 17th-century rare materials. This methodology enriched 1,487 MAchine Readable Cataloging (MARC) bibliographic records with hierarchical place names (MARC 752 fields) as part of a small pilot project. This article details a partially automated solution to this geographic discoverability and selection issue; however, a human component is still ultimately required to fully optimize the bibliographic data.
	Tweeting Tennessee’s Collections: A Case Study of a Digital Collections Twitterbot Implementation
This article demonstrates how a Twitterbot can be used as an inclusive outreach initiative that breaks down the barriers between the web and the reading room to share materials with the public. These resources include postcards, music manuscripts, photographs, cartoons and any other digitized materials. Once in place, Twitterbots allow physical materials to converge with the technical and social space of the Web. Twitterbots are ideal for busy professionals because they allow librarians to make meaningful impressions on users without requiring a large time investment. This article covers the recent implementation of a digital collections bot (@UTKDigCollBot) at the University of Tennessee, Knoxville (UTK), and provides documentation and advice on how you might develop a bot to highlight materials at your own institution.
	Building Strong User Experiences in LibGuides with Bootstrapr and Reviewr
With nearly fifty subject librarians creating LibGuides, the LibGuides Management Team at Notre Dame needed a way to both empower guide authors to take advantage of the powerful functionality afforded by the Bootstrap framework native to LibGuides, and to ensure new and extant library guides conformed to brand/identity standards and the best practices of user experience (UX) design. To accomplish this, we developed an online handbook to teach processes and enforce styles; a web app to create Twitter Bootstrap components for use in guides (Bootstrapr); and a web app to radically speed the review and remediation of guides, as well as better communicate our changes to guide authors (Reviewr). This article describes our use of these three applications to balance empowering guide authors against usefully constraining them to organizational standards for user experience. We offer all of these tools as FOSS under an MIT license so that others may freely adapt them for use in their own organization.
	IIIF by the Numbers
The UCLA Library began work on building a suite of services to support IIIF for their digital collections. The services perform image transformations and delivery as well as manifest generation and delivery. The team was unsure about whether they should use local or cloud-based infrastructure for these services, so they conducted some experiments on multiple infrastructure configurations and tested them in scenarios with varying dimensions.
	Trust, But Verify: Auditing Vendor-Supplied Accessibility Claims
Despite a long-overdue push to improve the accessibility of our libraries’ online presences, much of what we offer to our patrons comes from third party vendors: discovery layers, OPACs, subscription databases, and so on. We can’t directly affect the accessibility of the content on these platforms, but rely on vendors to design and test their systems and report on their accessibility through Voluntary Product Accessibility Templates (VPATS). But VPATs are self-reported. What if we want to verify our vendors’ claims? We can’t thoroughly test the accessibility of hundreds of vendor systems, can we? In this paper, we propose a simple methodology for spot-checking VPATs. Since most websites struggle with the same accessibility issues, spot checking particular success criteria in a library vendor VPAT can tip us off to whether the VPAT as a whole can be trusted. Our methodology combines automated and manual checking, and can be done without any expensive software or complex training. What’s more, we are creating a repository to share VPAT audit results with others, so that we needn’t all audit the VPATs of all our systems.