Generative-AI Summarization

Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. [1] One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.

The linked Python script is an example. Given the path to a plain text file, the script will load a configured large-language model, vectorize the given plain text file, compare the two, and output a three-sentence summary. I enhanced the script to work in batch, and thus I have used the technique to summarize collections of items:

each chapter in each book written by Jane Austen
250 journal articles on the topic rheumatoid arthritis
another 250 journal articles on the topic of climate change
130 articles on the topic of cataloging

For any given document there are zero 100% correct summaries; everybody will summarize a document differently. That said, the results of this automated process look pretty good to me. Moreover, each list of summaries addresses difficult to answer questions such as:

how can Jane Austen's works be characterized?
what is rheumatoid arthritis and what are some of its treatments?
how is climate change being manifested across the globe?
how has the practice of cataloging changed over time?

The lists of summaries may be deemed as information overload in-and-of themselves, and one might consider summarizing the summaries. Such is an exercise left up to the reader.

I believe libraries and librarians ought to learn how to exploit generative-AI for summarization purposes. Just as the migration of printed cards to MARC transformed how libraries hosted catalogs, migrating from hand-crafted summaries to computed summaries will transform how information overload is managed.

[1] Blair, Ann. 2010. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven Conn: Yale University Press.

Creator: Eric Lease Morgan <emorgan@nd.edu>
Source: This is the original publication of this posting.
Date created: 2024-06-27
Date updated: 2024-06-27
Subject(s): libraries and librarianship; large-language models (LLMs); summarization;
URL: https://distantreader.org/blog/summarization/