The Distant Reader - A tool for reading
The Distant Reader is a tool for reading.
The Distant Reader empowers you to use & understand large amounts of textual information both quickly & easily. For example, the Distant Reader can consume the entire issue of a scholarly journal, the complete works of a given author, or the content found at the other end of an arbitrarily long list of URLs. Thus, the Distant Reader is akin to a book's table-of-contents or back-of-the-book index but at scale. It simplifies the process of identifying trends & anomalies in a corpus, and then it enables you to further investigate those trends & anomalies.
Technically speaking, the Distant Reader is a system which locally harvests/caches content you specify. It then transforms the content into plain text, performs sets of natural language processing & text mining against the text, saves the results in a number of formats, reduces the whole to a cross-platform database file, queries the database thus summarizing the collection, zips the results of the entire process into a single file, and makes the file available to you for further investigation -- "reading".
Here is a list of sample corpora, their about pages, and links to their data sets:
- Cultural Analytics - about page, search, data set
- 318,287 words; 33 documents; 67 uncompressed MB
- all articles from a journal named Cultural Analytics as of June 1, 2019
- Plato - about page, search, data set
- 929,704 words; 24 documents; 112 uncompressed MB
- the complete works of Plato
- Code4Lib Journal - about page, search, data set
- 1,234,348 words; 303 documents; 286 uncompressed MB
- all articles from a journal named Code4Lib Journal as of June 1, 2019
- aesthetics - about page, search, data set
- 2,296,890 words; 37 documents; 287 uncompressed MB
- books classified as the philosophy of art
- love stories - about page, search, data set
- 238,374,038 words; 460 documents; 5.94 uncompressed GB
- books classified as love stories
I don't know about you, but now-a-days I can find plenty of scholarly & authoritative content. My problem is not one of discovery but instead one of comprehension. How do I make sense of all the content I find? The Distant Reader is intended to address this question by making observations against a corpus and providing tools for interpreting the results. It is my hope you will find the tool as useful as I do.
Eric Lease Morgan <firstname.lastname@example.org>
June 14, 2019