Generative-AI Summarization

Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. [1] One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.

The linked Python script is an example. Given the path to a plain text file, the script will load a configured large-language model, vectorize the given plain text file, compare the two, and output a three-sentence summary. I enhanced the script to work in batch, and thus I have used the technique to summarize collections of items:

For any given document there are zero 100% correct summaries; everybody will summarize a document differently. That said, the results of this automated process look pretty good to me. Moreover, each list of summaries addresses difficult to answer questions such as:

  • how can Jane Austen's works be characterized?
  • what is rheumatoid arthritis and what are some of its treatments?
  • how is climate change being manifested across the globe?
  • how has the practice of cataloging changed over time?

The lists of summaries may be deemed as information overload in-and-of themselves, and one might consider summarizing the summaries. Such is an exercise left up to the reader.

I believe libraries and librarians ought to learn how to exploit generative-AI for summarization purposes. Just as the migration of printed cards to MARC transformed how libraries hosted catalogs, migrating from hand-crafted summaries to computed summaries will transform how information overload is managed.

[1] Blair, Ann. 2010. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven Conn: Yale University Press.


Creator: Eric Lease Morgan <emorgan@nd.edu>
Source: This is the original publication of this posting.
Date created: 2024-06-27
Date updated: 2024-06-27
Subject(s): libraries and librarianship; large-language models (LLMs); summarization;
URL: https://distantreader.org/blog/summarization/