Distant Reader Study Carrels, Curated
This is a list of curated Distant Reader study carrels. Distant Reader study carrels are data sets intended to be read by people as well as computers. The study carrels presented here have had some analysis -- commentary -- done against them. Such is the point of the Reader: 1) create a data set, 2) analyze ("read") it, 3) write up the analysis, and 4) share the result with the wider world.
The list is presented in reverse-chronological order:
- What are hermits? - For a good time, I collected and analyzed ("read") as many books and articles on the topic of hermits as I could, and I did this for the purposes of simply learning about hermits and hermitages. What did I learn? Simply put, my understanding of what hermits was re-enforced. Hermits are solitary people -- usually men -- and they are usually interested in delving into religious experiences. Along the way I also learned a bit about hermit crabs.
- provenance: This is a Distant Reader study carrel. All of it's items come from the HathiTrust and Distant Reader Index. First, a set of 'Trust items with the subject "hermit" were identified, deduplicated with OpenRefine, and cached using the Trust's Data API. Second, the Index was queried for the subject term "hermits" and cached. I renamed files, created a Reader metadata.csv file, and the combined items from both of these caches make up the content of this study carrel.
- keywords: day; Distant Reader; distant reading; hands; hermit crabs; hermits; life; man; love; mind; study carrel; time
- date created: 2023-12-23
- DOI: 10.5281/zenodo.11475148
- A Year of Journal of Open Humanities Data - I asked myself, "What can I learn by applying distant reading computing techniques against a single year of content from the Journal of Open Humanities Data?" In a sentence, I learned a great deal about the Journal, and it very much lives up to is name.
- provenance: I believe I collected this data by hand, renamed the files, created a Reader metadata.csv file, and finally, built the carrel.
- keywords: Journal of Open Humanities Data; Distant Reader; distant reading; study carrel; dataset; information; research; analysis; corpora
- date created: 2023-07-13
- DOI: 10.5281/zenodo.11475069
- Analyzing the University of Notre Dame's Theses & Dissertations - For a number of reasons, I spent some time analyzing a subset of the University of Notre Dame's theses & dissertations, and this missive outlines what I learned.
- provenance: To create this study carrel I applied the University's institutional repository API to the... institutional repostory. This gave me basis bibiographics. I then saved the abstracts as files, created a Reader metadata.csv files, and finally build the carrel.
- keywords: cells; data; Distant Reader; distant reading; energy; flow; religion; politics; society; study carrel; surface; systems; theses and dissertations; University of Notre Dame
- date created: 2023-06-28
- DOI: 10.5281/zenodo.11475134
- DEI in libraries - Relatively recently, a colleague (Peggy Griesinger) distributed a bibliography on the topic of diversity, equity, and inclusion (DEI), and I decided to spend some time analyzing the content of the bibliography. This missive outlines what I was able to extract, given the limited time I spent.
- provenance: Given a bibliography with DOI's, I manually downloaded as many of the associated articles as possible, simplified the resulting file names, created a metadata.csv file, and finally created the carrel.
- keywords: diversity, equity, and inclusion (DEI); Distant Reader; study carrel; distant reading; american; people; libraries; resources; knowledge; access; community
- date created: 2023-05-12
- DOI: 10.5281/zenodo.11474672
- Theological Librarianship (Volume 16, Number 1): More questions than answers - In an effort to see how quickly I can do analysis, against the entire issue of a given journal, I did some reading against Theological Librarianship (Volume 16, Number 1), and in the end, I am going away with more questions than answers, but I did learn about "ethiopic".
- provenance: Quite frankly, I forget the provenance of this carrel. I probably downloaded the files by hand, renamed them, created a Reader metadata.csv file, and finally, built the carrel.
- keywords: collections; distant reading; Distant Reader; information; libraries; study carrel; Theological Librarianship; theology
- date created: 2023-04-27
- DOI: 10.5281/zenodo.11475131
- Emily Dickinson's Poems: Series #1, #2, and #3 - I did the quickest of readings against Emily Dickinson's three series of poems, and, quite frankly, I did not learn a whole lot.
- provenance: I downloaded plain text versions of Dickinson's poems from the HathiTrust. I then divided each work into individual poems systematically naming each file along the way. I then created a Reader metadata.csv file and build the carrel. For extra credit, I also downloaded PDF versions of the poems and bound them into books, but such is an aside.
- keywords: day; Distant Reader; distant reading; Emily Dickinson; flower; god; heaven; live; love; study carrel; summer; time
- date created: 2023-04-23
- DOI: 10.5281/zenodo.11474953
- How to Read a Whole HathiTrust Collection - This Web pages outlines a process to use and understand ("read") the whole of a HathiTrust collection. Such a process is outlined here: 1) articulate a research question, 2)search the 'Trust and create a collection, 3) download the collection file and refine it, or at the least, remove duplicates, 4) use the result as input to htid2books; download the full text of each item, 5) use Reader Toolbox to build a "study carrel"; create a data set, 6) compute against the data set to address the research question, and 7) go to Step #1; repeat iteratively
- provenance: This entire data set is about creating study carrels; read the missive. That said, I used the HathiTrust Data API to download the PDF and plain text versions of 'Trust items. The coolest part of the process was using OpenRefine to... refine the provided collection file.
- keywords: beauty; data science; Distant Reader; distant reading; fear; HathiTrust; life; man; mind; nature; power; study carrel
- date created: 2023-03-29
- DOI: 10.5281/zenodo.11475113
- Discernment, or "It's a lot about Mom." - Joey Jegier and I would like to know, "When it comes to a set of first year student writings (reflections), with what other words does the word 'discernment' keep?" The answer to this question helps address other, broader questions about the ways students' hearts and minds mature over the first year of their college experience.
- provenance: To create this data set we collected a set of student reflections, akin to diary entries, and then we anonymized them. We then created the simplest of Reader metadata.csv files to ultimately build the carrel.
- keywords: career; community; discernment; Distant Reader; distant reading; life; love; people; study carrel; time; world
- date created: 2023-03-23
- DOI: 10.5281/zenodo.11475125
- Hesburgh Libraries Archives - I am curious to learn, "What sorts of things are in the Hesburgh Libraries Archives?"
- provenance: I first used a program called wget to downloaded all of the Encoded Archival Description (EAD) files from the Archives's website. This generated a cache of 1,200 XML files. I then extracted the titles, biographical histories, and scope notes of 258 of these items. They represent Catholic manuscript collections and collections termed 'Notre Dame-related' (professor and alum personal papers, materials related to Notre Dame history, et cetera). These items did not include closed University records.
- keywords: club; college; Distant Reader; distant reading; hall; Hesburgh Libraries Archives; Notre Dame; students; study carrel
- date created: 2023-03-07
- DOI: 10.5281/zenodo.11475000
- Race and Ethnic Relations - I applied bits of text mining, natural langauge processing, and data science to a pair of annual editions of Race and Ethnic Relations, and below is a summary of what I learned.
- provenance: I was given two editions of Race and Ethnic Relations dated 1994/95 and 1997/98. Each edition is about 275 pages long, and they are comprised of about 50 articles each. I proceeded to digitize each edition, and then I divided each one into 50 segments in an effort to approximate each article. I then created the simplest of Reader metadata.csv files and then built the carrel.
- keywords: american; asians; blacks; chinese; Distant Reader; distant reading; people; politics; Race and Ethnic Relations; race; study carrel; women; world
- date created: 2023-02-07
- DOI: 10.5281/zenodo.11475100
- Public schools in New Orleans by topic - I did some rudimentary investigations of public schools in New Orleans after extracting sets of documents on given topics, namely: black, charter, new, schools, and teachers. Below is an introduction to what I found.
- provenance: This is dummy provenance statement. Ask Eric to update it.
- keywords: New Orleans; public schools; charter schools; Hurricane Katrina
- date created: 2023-02-02
- DOI: 10.5281/zenodo.11475077
- Code4Lib Journal, Issue 55 - The lastest issue of Code4Lib Journal came out yesterday, and I wanted to see how quickly I could garner insights regarding the issue's themes, topics, and questions addressed. I was able to satisfy my curiosity about these self-imposed challenges, but ironically, it took me longer to write this blog posting than it did for me to do the analysis.
- provenance: By hand, I downloaded each article from the issue, simplified the names of the files, created a Reader metadata.csv file, and build the study carrel.
- keywords: Code4Lib Journal; Distant Reader; distant reading; study carrel; library; file; data; open source; video; Google
- date created: 2023-01-21
- DOI: 10.5281/zenodo.11474939
- Where in the world is TCB? - A colleague (Christa Strickler) announced on a mailing list (ACQNET) the existence of a new issue of TCB (Technical Services in Religion and Theology). It was touted as an open access journal, and I wondered whether or not there was an application programmer interface (API) for downloading the content. After a bit of rooting around, I discovered that TCB is published using a system called Open Journal Systems (OJS), and OJS rigorously supports a protocol called OAI-PMH. So, to answer my question, "Yes, TCB does support an API.". This data set outlines some of the things I learned through the application of distant reading against it.
- provenance: The content of this carrel was harvested via OAI-PMH from the platform hosting TCB. Once created, the carrel languished until I wanted to more formally publish it, and it was at that time when I updated the carrel to it's current structure. The analysis remains the same.
- keywords: cataloging; classification; Distant Reader; distant reading; information; library; RDA; religion; study carrel; services; Technical Services in Religion and Theology; theology
- date created: 2022-10-28
- DOI: 10.5281/zenodo.11475154
- Reading Journal of eScience Librarianship: Responsible AI in Libraries and Archives - A special issue of Journal of eScience Librarianship was brought to my attention. The issue was on the topic of responsible AI in libraries and archives. I did a bit of distant reading against the issue, and outlined here are some of my take-aways. In short, AI is something to consider in Library Land, but not without some forethought.
- provenance: I manually downloaded a set of articles from the journal issue's website, renamed the files accordingly, created a metadata.csv file, and created the study carrel.'
- keywords: artificial intelligence; data; Distant Reader; distant reading; escience; Journal of eScience Librarianship; librarianship; libraries; study carrel
- date created: 2022-10-28
- DOI: 10.5281/zenodo.11475052
- Reading HaunthiTrust, or Spooky Season fun in the HTDL - Just for fun, let's read a HathiTrust collection called HaunthiTrust.
- provenance: This one was tricky. First, I used the HathiTrust Data API to download and cache both the PDF and plain text versions of the given HathiTrust collection file. Then, because the downloaded PDF files had zero OCR, I used the plain text as input for the Reader's build process. Once the carrel was build, I replaced the plain text in the cache with their corresponding PDF files. This makes for a big study carrel, but the PDF files intended to read complete with their pictures.
- keywords: day; Distant Reader; distant reading; doors; eyes; HathiTrust; head; place; study carrel; thought; time; spooky; ghosts; haunted
- date created: 2022-10-22
- DOI: 10.5281/zenodo.11474982
- Reading Information Technology and Libraries (volume 41, number 2, June 2022) - Today, for a good time, I applied my Reader Toolbox to the latest issue of ITAL for the two-fold purposes of: 1) just seing whether the Toolbox could function, and 2) determine the degree I could extract meaningful themes from the issue. Well, the Toolbox functioned, in that it did not crash nor output invalid data, and I do believe I could pull out themes, in that each issue's authors wrote about something distinctive, and I could identify those things. Below describes my process.
- provenance: I'm pretty certain I manually copied each article from the ITAL website, renamed the files, created a metadata.csv file, and build a study carrel -- this data set -- from the result.
- keywords: Information Technology and Libraries (ITAL); distant reading; Distant Reader; study carrel; libraries; technology; information; research
- date created: 2022-06-22
- DOI: 10.5281/zenodo.11475022
- University of Notre Dame News: A Reading - I have done a bit of analysis -- reading -- against the set of news distributed by the University of Notre Dame, and below is some of what I learned.
- provenance: I used program called wget to crawl the news site and cache the result. Upon closer inspection of the cache, I noticed how some of the Web pages were echoed and indexed in a number of auxiliary pages. I deleted the echoes and index pages, and I copied all of the news stories to a single directory. I then applied a tool called the Distant Reader Toolbox against the directory. This resulted in a data set of news stories which I proceeded to analyze.
- keywords: awards; Distant Reader; distant reading; lectures; people; research; football; students; professors; study carrel; University of Notre Dame news
- date created: 2022-02-22
- DOI: 10.5281/zenodo.11475087
Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Schoarship
University of Notre Dame
June 4, 2024