id author title date pages extension mime words sentences flesch summary cache txt infomotions-com-6757 Infomotions Mini-Musings .xml application/rss+xml 11414 796 66 It means PDF files need to have been “born digitally” or they need to have been processed with optical character recognition (OCR), and then … Continue reading Creating a plain text version of a corpus with Tika This essay describes, illustrates, and demonstrates how the Digital Public Library of America (DPLA) can build on the good work of others who support the creation and maintenance of collections and provide value-added services against texts — a concept we call “use & understand”. I decided to give it a whirl and particpate in the DPLA Beta Sprint, and below is my submission: DPLA Beta Sprint Submission My DPLA Beta Sprint submission will describe and demonstrate how the digitized versions of library collections can be made more useful through the application of text mining and various other digital humanities … Continue reading DPLA Beta Sprint Submission This posting describes the initial process I am using to do such a thing, but the imporant thing to note is that this process is more about librarianship than it is … Continue reading Collecting the Great Books ./cache/infomotions-com-6757.xml ./txt/infomotions-com-6757.txt