Generative-AI Summarization
Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. [1] One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.
The linked Python script is an example. Given the path to a plain text file, the script will load a configured large-language model, vectorize the given plain text file, compare the two, and output a three-sentence summary. I enhanced the script to work in batch, and thus I have used the technique to summarize collections of items:
- each chapter in each book written by Jane Austen
- 250 journal articles on the topic rheumatoid arthritis
- another 250 journal articles on the topic of climate change
- 130 articles on the topic of cataloging
For any given document there are zero 100% correct summaries; everybody will summarize a document differently. That said, the results of this automated process look pretty good to me. Moreover, each list of summaries addresses difficult to answer questions such as:
- how can Jane Austen's works be characterized?
- what is rheumatoid arthritis and what are some of its treatments?
- how is climate change being manifested across the globe?
- how has the practice of cataloging changed over time?
The lists of summaries may be deemed as information overload in-and-of themselves, and one might consider summarizing the summaries. Such is an exercise left up to the reader.
I believe libraries and librarians ought to learn how to exploit generative-AI for summarization purposes. Just as the migration of printed cards to MARC transformed how libraries hosted catalogs, migrating from hand-crafted summaries to computed summaries will transform how information overload is managed.
[1] Blair, Ann. 2010. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven Conn: Yale University Press.
Creator: Eric Lease Morgan <emorgan@nd.edu>
Source: This is the original publication of this posting.
Date created: 2024-06-27
Date updated: 2024-06-27
Subject(s): libraries and librarianship; large-language models (LLMs); summarization;
URL: https://distantreader.org/blog/summarization/