Reading Journal of eScience Librarianship: Responsible AI in Libraries and Archives

A special issue of Journal of eScience Librarianship was brought to my attention. The issue was on the topic of responsible AI in libraries and archives. I did a bit of distant reading against the issue, and outlined here are some of my take-aways.

Basic characteristics

There are nine articles in the issue for a total of 49,000 words. (See the rudimentary bibliography.) Thus, based on my experience, none of the articles are particularly long nor short. A rudimentary count & tabulation of unigrams, bigrams, and statistically significant keywords can be visualized, and from the results one can begin to get an idea of what is discussed in the articles:

unigrams

bigrams

keywords

For additional descriptive statistics-like analysis, see the generic index page.

Topics/themes and their distribution

Through the use of topic modeling, it is possible enumerate over-arching themes. After removing some stop words and modeling the corpus for nine topics (because there are nine articles), the following themes presented themselves and their distribution over the whole issue can be visualized:

        topics  weights  features
          data  0.89933  data university project research ethical new
   information  0.23128  information research chatbots chatgpt provided
       science  0.18005  science recommendation system academic service
       records  0.14033  data records learning library community japan
      keenious  0.11057  keenious library tools questions libraries
     sentiment  0.10152  sentiment analysis beatles articles historical
   descriptors  0.07133  descriptors metadata fashion costume core term
         nhgri  0.05405  nhgri archive genome human project documents
          news  0.04590  news transcripts vtna television collection

topics

When the underlying topic model is supplemented with author metadata values, the underlying model can be pivoted to address the question, "What authors discuss what topics?" From the results we can see that each author discusses something unique, but to some degree, each author discusses the theme of data-university-project-research. For example, Elings discusses data and records:

topics over time

Network analysis

Modeling the special issue's articles in the form of network graphs is another way to garner what is discussed by whom and to what degree. For example, authors write articles and articles can be described with keywords. These things can be represented as nodes/edges combinations: authors --> articles --> keywords. Similarly, keywords have semantic relationships to other keywords, and those other keywords point to additional keywords: keywords --> keywords --> keywords. When it comes to the former, we can see how many authors discuss shared ideas (the words in the center of the graph), while they also speak to things unique to themselves. When it comes to the later, we can see how the contents of the issue are akin to a spectrum of ideas beginning with information and moving towards analysis.

author network

keywords network

Closer reading

The distant reading garnered a view of the special issue from 30,000 feet, so to speak. More importantly, it highlighted for me a number of words of interest. I call this my lexicon, and these words include: responsible, ethical, artificial intelligence, AI, data, information, science, records, keenious, sentiment, descriptors, nhgri, and news. I now ask myself, "What can I specifically learn about each of these lexicon words? What are they, and what is important about them?" In other words, I want to do some closer reading.

keenious and nhgri

What in the world are keenious and nhgri? At first glance, "keenious" appears to be some sort of adjective, and "nhgri" appears to be an OCR abomination. I first queried my full text index and learned that each of these keywords appear in one and only one document. Keenious only appears in pastva-implementation-2024 and nhgri only appears in hosseini-ethical-2024. Through the use of sentence extraction techniques, the following definitions of keenious and nhgri came to light:

Hoping to address these issues, we were intrigued by Keenious, a recommender tool that utilizes search algorithms and AI to analyze input text to suggest relevant academic articles.
Using artificial intelligence (AI) and large language models (LLMs), we recently created the first structured digital representation of the historic archives of the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health.

What are artificial intelligence and AI?

Using a similar sentence extraction technique I queried the corpus for things like "artificial intelligence is" and "ai is", but there were too few results. I broadened my query for sentences merely including the word "intelligence", and after perusing the results, I was not able to point to a definition of artificial intelligence nor AI. Bummer. Can you point me to definitions of such things?

Responsible and ethical

Using the same technique again, I extracted a list of sentences containing the words "responsible" or "ethical", and I've listed some of the more interesting ones below:

Seven overarching ethical issues come to light in these case studies-privacy, consent, accuracy, labor considerations, the digital divide, bias, and transparency.
Therefore, as librarians and archivists begin using artificial intelligence (AI) to enhance library services, we also aim to interrogate the ethical issues that arise.
The eight responsible AI case studies included here show the variety of ways in which librarians and archivists are currently using AI in their practice, with a special focus on the ethical issues and considerations that arise over the course of implementing AI tools and systems.
The ethical issues raised in these case studies show that even as AI tools grow and change, our common professional values and ethical concerns as library and archives practitioners remain the same.
We hope that when other practitioners read these case studies, they will be able to translate the ethical considerations and harm-reduction strategies in the case studies to their own work with AI.
However, since we were conducting analysis of previously published newspaper articles, we did not follow a specific ethical code for interacting with the source material, as we were unable to find recommendations applying to our work exactly.
Beyond the challenges of implementing AI/machine learning technology, the project presented a number of ethical issues due to the potentially sensitive nature of the records being digitized, records which contain PII and other information about vulnerable individuals.
Responsible implementation of AI in this context relied upon these different knowledge communities to collaboratively develop a machine learning pipeline informed by considerations of privacy and ethics, and to apply an ethical framework for co-curation of the various digital resources produced by the project.
Our project raised several ethical issues, including the possibility of downsizing our workforce, the misidentification of entities that we extract from transcripts, and the potential violation of privacy involved with facial recognition software.
As we move forward in the AI/ML era, we hope to build on these early successes while keeping a careful eye on the potential ethically related pitfalls of misdescription, de-professionalization, and compromised privacy.
Our observations could help members of the GLAM (Galleries, Libraries, Archives, and Museums) community navigate the numerous ethical challenges of using AI and large language models (LLMs) when processing and exploring archives.
Since AI is trained by humans, human collaborators are essentially also responsible for AI mistakes during the de-identification process.
Instead we had to largely develop our own metrics and practices to understand if Keenious would be useful to our users, identify the ethical questions associated with implementing the tool, and determine how to assess the tool as it evolves in the future.
We must acknowledge that disruptive change from AI tools has already arrived, and libraries should be proactive in preparing themselves for whatever ethical challenges lay ahead.
On the one hand, there are legitimate concerns regarding how this new technology may be misused, which may lead to a primary "resistance attitude" that emphasizes ethical considerations related to preventing harm.

Modals

Modal sentences -- sentences containing modal verbs like "should", "ought" , or "will" -- can be quite telling because such sentences are assertive. Using the lexicon as input, I output a list of modal sentences containing lexicon words. Listed here are some of the more interesting sentences:

Additionally, AI may introduce biases into the research process or make it more difficult to find and evaluate relevant sources.
While AI can identify relevant resources and provide faster access to information, it lacks the human connection and ability to have substantive conversations that librarians bring to the research process.
By working collaboratively with faculty members, libraries can ensure that AI is integrated effectively into the curriculum and research workflows, and that students are prepared for a future where AI will play an increasingly important role in their academic and professional lives.
Therefore, libraries must secure users' privacy rights when choosing new tools and services.
Academic librarians will be key stakeholders for developing chatbot literacy strategies that help students and faculty members understand both the pitfalls and potential of chatbot access.
Examining strategies for training users on how best to interact with chatbots in seeking information consequently merits attention, and librarians can play key roles in this area.

AI on AI

I then applied a large-language indexing technique -- RAG (retrieval augmented generation) -- against the content -- AI on AI. Considering what I had learned so far, I queried the index. The results are more than plausible, and each result is associated with the one more more documents from whence the result was generated:

Ask anything: What is responsibility?

Responsibility in this context entails collaborating with experts in
behavioral sciences, artificial intelligence, and social work to define
research consultation scenarios, lead case simulation construction and
analysis, review and select GPT-based products, provide technical
guidance on chatbot inquiries, and access supplemental ShareGPT data. It
also involves ensuring access to necessary resources and tools, such as
leading GPT-based products, without the need for special computing
software or capabilities.

References: 
  1. anderson-responsible-2024 / anderson (2024) (7)
  2. feng-ethical-2024 / feng (2024) (4)
  3. mannheimer-introduction-2024 / mannheimer (2024) (1)


Ask anything: What is artificial intelligence or AI?

Artificial intelligence, often referred to as AI, is a technology that
can automate tasks previously done by humans. It has the capacity to
analyze large amounts of data and generate new insights. However, AI
implementation raises concerns about biases, reproducibility, and
ethical considerations. While AI can improve research efficiency and
accuracy, it is important to recognize its limitations and potential
impacts on various aspects of work and society.

References: 
  1. feng-ethical-2024 / feng (2024) (4)
  2. mannheimer-introduction-2024 / mannheimer (2024) (3)
  3. pastva-implementation-2024 / pastva (2024) (2)
  4. anderson-responsible-2024 / anderson (2024) (1)
  5. elings-using-2024 / elings (2024) (1)
  6. mcirvin-automatic-2024 / mcirvin (2024) (1)


Ask anything: What are libraries?

Libraries are institutions that prioritize innovation and aim to create
a 21st-century library that serves as a cornerstone of world-class
research and scholarship. They seek out new tools and resources to
enrich the scholarly information ecosystem, improve the resource
discovery process, and point users to relevant research available in the
library. Libraries also acknowledge that research begins outside of the
library and aim to improve the research process while pointing back to
library resources. Additionally, libraries engage in partnerships with
faculty, educators, and service areas to enhance understanding of
library service use and provide effective support to the community.

References: 
  1. pastva-implementation-2024 / pastva (2024) (6)
  2. beltran-open-2024 / beltran (2024) (2)
  3. elings-using-2024 / elings (2024) (2)
  4. feng-ethical-2024 / feng (2024) (2)


Ask anything: What are the responsible and ethical issues surrounding
the use of artificial intelligence in libraries?

Privacy, consent, accuracy, labor considerations, the digital divide,
bias, and transparency are the responsible and ethical issues
surrounding the use of artificial intelligence in libraries as discussed
in the provided context. It is essential to address these issues when
incorporating AI tools and systems in library services to uphold ethical
standards and ensure responsible technology use.

References: 
  1. feng-ethical-2024 / feng (2024) (5)
  2. pastva-implementation-2024 / pastva (2024) (3)
  3. mannheimer-introduction-2024 / mannheimer (2024) (3)
  4. anderson-responsible-2024 / anderson (2024) (1)

Summary

Through the use of text mining, natural language processing, and a few machine learning computing techniques I analyzed -- "read" -- a special issue of Journal of eScience Librarianship on the topic of responsible AI in libraries and archives. Based on my analysis the responsible and ethical use of AI in libraries surrounds privacy and bias. Moreover, there is a perception that artificial intelligence can be used effectively in libraries but not until the issues privacy and bias are addressed.

Epilogue

This analysis was done by first creating a Distant Reader data set -- affectionately called a "study carrel", and the data set as well as all of the modeling done against it is temporarily available as a zip file at the following URL:

https://distantreader.org/stacks/carrels-patrons/ericleasemorgan/jeslib-v13_n01-2024/index.zip

Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

March 13, 2024