Collecting and curating library journals


I've begun to collect and curate library-related journals, and this posting introduces the process.

As a librarian I am interested in the collecting, organizing, preserving and disseminating data, information, and knowledge. With the advent of the Internet, the processes used to accomplish these goals increasingly includes the use of computers. Moreover, it is important to use computers to not only automate tasks but also to add value (curate) the result. I am intereted in applying this process to library-related journals.

Creating the collection

To these ends, I began by identifying library-related titles listed in the venerable Directory of Open Access Journals (DOAJ ). I limited my selections to titles published using Open Journal Systems (OJS ) because titles published using OJS robustly support a protocol called Open Access Initiative-Protocol for Metadata Harvesting (OAI-PMH ). In the end, I identified about thirty titles. I used these titles as input to a suite of software called OJS Toolbox Redux . This resulted in a set of bibliographic databases as well as a cache of all the articles from each journal. Finally, I transformed each cache into a data set (affectionately called "study carrels") using the Distant Reader Toolbox .

Below is a list of the titles I have collected so far, and each is linked to a collection of their articles - a study carrel. Each item has exactly the same layout, which makes them very computable. For example, all the articles from Information Technology & Libraries are located in its cache directory, and a rudimentary bibliography of the same is saved in its etc directory.

To further understand study carrels, puruse the tutorial which includes the things I've been reading for the past two years, and then browse the library-related titles (carrels) below:

Canadian Journal of Academic Librarianship * College and Research Libraries News * College and Research Libraries * Evidence Based Library and Information Practice * Information Technology and Libraries * International Journal of Digital Curation * International Journal of Information Diversity & Inclusion * International Journal of Librarianship * Issues in Science and Technology Libraries * Journal of Civic Information * Journal ofCopyright in Education and Librarianship * Journal of Information Literacy * Journal of Rare Books Manuscripts and Cultural Heritage * Journal of the Canadian Health Libraries Association * Journal of the European Association for Health Information and Libraries * Journal of the Medical Library Association * Knowledge Creation Dissemination and Preservation Studies * Liber Quarterly: The Journal of European Research Libraries * Library and Information Research * North Carolina Libraries * Partnership: the Canadian Journal of Library and Information Practice and Research * Pennsylvania Libraries: Research & Practice * TCB: Technical Services in Religion & Theology * Theological Librarianship

Initial observations

A rigorous analysis of the collection is beyond the scope of this posting, after all the collection includes about 22 thousand articles and 79 million words. (By comparison, Moby Dick is about .25 million words long.)

That said, some initial observations can be made, especially concerning extent. For example, College & Research Libraries (CRL) is by far the largest collection at 7,000 items and 30 million words, which is almost half of the whole corpus. This is because the title goes back to 1939. (The oldest item in CRL is " Introducing 'College and Research Libraries: Why Another Library Journal" by A. F. Kuhlman.) By comparison, Information Technology & Libraries (ITAL) includes 790 items and 3.5 million words. It dates back to 1968. (The oldest item in ITAL is " Brown University Library Fund Accounting System" by Robert Wedgeworth).

It is also interesting to compare & contrast the use of words between these two titles. Bigram analysis highlights the prominance of research in one and Web technologies in the other. The same pattern manifests itself when examining the statistically significant keywords:
CRL bigrams
ITAL bigrams
CRL keywords
CRL keywords

Finally, in this analysis, one can concordance for phrases like "libraries are" to see how libraries are defined. Each line below is preceded with "libraries are". The first set is from CRL, and the second set is from ITAL:

  • always insufficient to meet the demand. in the years to come, withdrawals of rn
  • an index of institutional excellence, as investigators have found, the survival
  • dealing with printing in reference computer labs. the survey asked for quantita
  • generally not required to provide cost accountability or to justify the costs a
  • not neutral spaces: social justice advocacy in librarianship," library journal
  • already exploring creative approaches to providing internet access for these un
  • facing considerable infrastructure management issues at a time when library use
  • now interested in incorporating new web technologies into their offerings and o
  • torn between the values of providing safe access for younger patrons and broad
  • valued more than the internet for providing accurate information, privacy, and

The complete concordancing for CRL and the complete concordacing of ITAL nis available here.


For a good time, I have begun to create a collection of library-related journals and curate the results.

In this day and age, it is not sufficent for libraries to stop at collection, but instead, they must add value, provide some interpretation, and "Save the time of the reader." In the future and in my copius spare time, I hope to do some more curation against this collection and share additional observations.

Fun with librarianship in the 21st Century.

P.S. If you would like to create similar collections, then don't hesitate to drop me a line!

Creator: Eric Lease Morgan <>
Source: This posting was never formally published.
Date created: 2022-11-17
Date updated: 2022-11-17
Subject(s): libraries and librarianship;