Infomotions Mini-Musings

Infomotions Mini-Musings
Artist- and Librarian-At-Large
	Charting & graphing with Tableau Public
They say, &#8220;A picture is worth a thousand words&#8221;, and through use of something like Tableau this can become a reality in text mining. After extracting features from a text, you will have almost invariably created lists. Each of the items on the lists will be characterized with bits of context thus transforing the raw &#8230; Continue reading Charting &#038; graphing with Tableau Public
	Extracting parts-of-speech and named entities with Stanford tools
Extracting specific parts-of-speech as well as &#8220;named entities&#8221;, and then counting &#38; tabulating them can be quite insightful. Parts-of-speech include nouns, verbs, adjectives, adverbs, etc. Named entities are specific types of nouns, including but not limited to, the names of people, places, organizations, dates, times, money amounts, etc. By creating features out of parts-of-speech and/or &#8230; Continue reading Extracting parts-of-speech and named entities with Stanford tools
	Creating a plain text version of a corpus with Tika
It is imperative to create plain text versions of corpus items. Text mining can not be done without plain text data. This means HTML files need to be rid of markup. It means PDF files need to have been &#8220;born digitally&#8221; or they need to have been processed with optical character recognition (OCR), and then &#8230; Continue reading Creating a plain text version of a corpus with Tika
	Identifying themes and clustering documents using MALLET
Topic modeling is an unsupervised machine learning process. It is used to create clusters (read &#8220;subsets&#8221;) of documents, and each cluster is characterized by sets of one or more words. Topic modeling is good at answering questions like, &#8220;If I were to describe this collection of documents in a single word, then what might that &#8230; Continue reading Identifying themes and clustering documents using MALLET
	Introduction to the NLTK
The venerable Python Natural Language Toolkit (NLTK) is well worth the time of anybody who wants to do text mining more programmatically. [0] For much of my career, Perl has been the language of choice when it came to processing text, but in the recent past it seems to have fallen out of favor. I &#8230; Continue reading Introduction to the NLTK
	Using Voyant Tools to do some “distant reading”
Voyant Tools is often the first go-to tool used by either: 1) new students of text mining and the digital humanities, or 2) people who know what kind of visualization they need/want. [1] Voyant Tools is also one of the longest supported tools described in this bootcamp. As stated the Tool&#8217;s documentation: &#8220;Voyant Tools is &#8230; Continue reading Using Voyant Tools to do some &#8220;distant reading&#8221;
	Using a concordance (AntConc) to facilitate searching keywords in context
A concordance is one of the oldest of text mining tools dating back to at least the 13th century when they were used to analyze and &#8220;read&#8221; religious texts. Stated in modern-day terms, concordances are key-word-in-context (KWIC) search engines. Given a text and a query, concordances search for the query in the text, and return &#8230; Continue reading Using a concordance (AntConc) to facilitate searching keywords in context
	Word clouds with Wordle
A word cloud, or sometimes called a &#8220;tag cloud&#8221; is a fun, easy, and popular way to visualize the characteristics of a text. Usually used to illustrate the frequency of words in a text, a word clouds make some features (&#8220;words&#8221;) bigger than others, sometimes colorize the features, and amass the result in a sort &#8230; Continue reading Word clouds with Wordle
	An introduction to the NLTK: A Jupyter Notebook
The attached file introduces the reader to the Python Natural Langauge Toolkit (NLTK). The Python NLTK is a set of modules and corpora enabling the reader to do natural langauge processing against corpora of one or more texts. It goes beyond text minnig and provides tools to do machine learning, but this Notebook barely scratches &#8230; Continue reading An introduction to the NLTK: A Jupyter Notebook
	What is text mining, and why should I care?
[This is the first of a number of postings on the topic of text mining. More specifically, this is the first draft of an introductory section of a hands-on bootcamp scheduled for ELAG 2018. As I write the bootcamp&#8217;s workbook, I hope to post things here. Your comments are most welcome. &#8211;ELM] Text mining is &#8230; Continue reading What is text mining, and why should I care?
	How to do text mining in 69 words
Doing just about any type of text mining is a matter of: 0) articulating a research question, 1) acquiring a corpus, 2) cleaning the corpus, 3) coercing the corpus into a data structure one’s software can understand, 4) counting &#38; tabulating characteristics of the corpus, and 5) evaluating the results of Step #4. Everybody wants &#8230; Continue reading How to do text mining in 69 words
	Achieving perfection
Through the use of the Levenshtein algorithm, I am achieving perfection when it comes to searching VIAF. Well, almost. I am making significant progress with VIAF Finder [0], but now I have exploited the use of the Levenshtein algorithm. In fact, I believe I am now able to programmatically choose VIAF identifiers for more than &#8230; Continue reading Achieving perfection
	VIAF Finder
This posting describes VIAF Finder. In short, given the values from MARC fields 1xx$a, VIAF Finder will try to find and record a VIAF identifier. [0] This identifier, in turn, can be used to facilitate linked data services against authority and bibliographic data. Quick start Here is the way to quickly get started: download and &#8230; Continue reading VIAF Finder
	Making stone soup: Working together for the advancement of learning and teaching
It is simply not possible for any of us to do our jobs well without the collaboration of others. Yet specialization abounds, jargon proliferates, and professional silos are everywhere. At the same time we all have a shared goal: to advance learning and teaching. How are we to balance these two seemingly conflicting characteristics in &#8230; Continue reading Making stone soup: Working together for the advancement of learning and teaching
	Protected: Simile Timeline test
There is no excerpt because this is a protected post.
	Editing authorities at the speed of four records per minute
This missive outlines and documents an automated process I used to &#8220;cleanup&#8221; and &#8220;improve&#8221; a set of authority records, or, to put it another way, how I edited authorities at the speed of four records per minute. As you may or may not know, starting in September 2015, I commenced upon a sort of &#8220;leave &#8230; Continue reading Editing authorities at the speed of four records per minute
	Failure to communicate
In my humble opinion, what we have here is a failure to communicate. Libraries, especially larger libraries, are increasingly made up of many different departments, including but not limited to departments such as: cataloging, public services, collections, preservation, archives, and now-a-days departments of computer staff. From my point of view, these various departments fail to &#8230; Continue reading Failure to communicate
	Using BIBFRAME for bibliographic description
Bibliographic description is an essential process of librarianship. In the distant past this process took the form of simple inventories. In the last century we saw bibliographic description evolve from the catalog card to the MARC record. With the advent of globally networked computers and the hypertext transfer protocol, we are seeing the emergence of &#8230; Continue reading Using BIBFRAME for bibliographic description
	XML 101
This past Fall I taught &#8220;XML 101&#8221; online and to library school graduate students. This posting echoes the scripts of my video introductions, and I suppose this posting could also be used as very gentle introduction to XML for librarians. Introduction I work at the University of Notre Dame, and my title is Digital Initiatives &#8230; Continue reading XML 101
	Mr. Serials continues
The (ancient) Mr. Serials Process continues to support four mailing list archives, specifically, the archives of ACQNET, Colldv-l, Code4Lib, and NGC4Lib, and this posting simply makes the activity explicit. Mr. Serials is/was a process I developed quite a number of years ago as a method for collecting, organizing, archiving electronic journals (serials). The process worked &#8230; Continue reading Mr. Serials continues
	Re-MARCable
This blog posting contains: 1) questions/statements about MARC and posted by graduate library school students taking an online XML class I&#8217;m teaching this semester, and 2) my replies. Considering my previously published blog posting, you might say this posting is &#8220;re-MARCable&#8221;. I&#8217;m having some trouble accessing the file named data.marc for the third question in &#8230; Continue reading Re-MARCable
	MARC, MARCXML, and MODS
This is the briefest of comparisons between MARC, MARCXML, and MODS. Its was written for a set of library school students learning XML. MARC is an acronym for Machine Readable Cataloging. It was designed in the 1960&#8217;s, and its primary purpose was to ship bibliographic data on tape to libraries who wanted to print catalog &#8230; Continue reading MARC, MARCXML, and MODS
	“Sum  reflextions” on travel
These are &#8220;sum reflextions&#8221; on travel; travel is a good thing, for many reasons. I am blogging in front of the Pantheon. Amazing? Maybe. Maybe not. But the ability to travel, see these sorts of things, experience the different languages and cultures truly is amazing. All too often we live in our own little worlds, &#8230; Continue reading &#8220;Sum  reflextions&#8221; on travel
	What is old is new again
The &#8220;how&#8217;s&#8221; of librarianship are changing, but not the &#8220;what&#8217;s&#8221;. (This is an outline for my presentation given at the ADLUG Annual Meeting in Rome (October 21, 2015). Included here are also the one-page handout and slides, both in the form of PDF documents.) Linked Data Linked Data is a method of describing objects, and &#8230; Continue reading What is old is new again
	Painting in Tuscany
As you may or may not know, I have commenced upon a sort of leave of absence from my employer, and I spent the last the better part of the last two weeks painting in Tuscany. Me and eight other students arrived in Arezzo (Italy) on Wednesday, October 1, and we were greeted by Yves &#8230; Continue reading Painting in Tuscany
	My water collection predicts the future
As many of you may or may not know, I collect water, and it seems as if my water collection predicts the future, sort of. Since 1979 or so, I’ve been collecting water. [1] The purpose of the collection is/was enable me to see and experience different parts of the world whenever I desired. As &#8230; Continue reading My water collection predicts the future
	Some automated analysis of Richard Baxter’s works
baxter This page describes a corpus named baxter. It is a programmatically generated report against the full text of all the writing of Richard Baxter (a English Puritan church leader, poet, and hymn-writer) as found in Early English Books Online. It was created using a (fledgling) tool called the EEBO Workset Browser. General statistics An &#8230; Continue reading Some automated analysis of Richard Baxter&#8217;s works
	Marrying close and distant reading: A THATCamp project
The purpose of this page is to explore and demonstrate some of the possibilities of marrying close and distant reading. By combining both of these processes there is a hope greater comprehension and understanding of a corpus can be gained when compared to using close or distant reading alone. (This text might also be republished &#8230; Continue reading Marrying close and distant reading: A THATCamp project
	Great Books Survey
I am happy to say that the Great Books Survey is still going strong. Since October of 2010 it has been answered 24,749 times by 2,108 people from people all over the globe. To date, the top five &#8220;greatest&#8221; books are Athenian Constitution by Aristotle, Hamlet by Shakespeare, Don Quixote by Cervantes, Odyssey by Homer, &#8230; Continue reading Great Books Survey
	Doing What I’m Not Suppose To Do
I suppose I&#8217;m doing what I&#8217;m not suppose to do. One of those things is writing in books. I&#8217;m attending a local digital humanities conference. One of the presenters described and demonstrated a program from MIT called Annotation Studio. Using this program a person can upload some text to a server, annotate the text, and &#8230; Continue reading Doing What I&#8217;m Not Suppose To Do
	Publishing LOD with a bent toward archivists
eye candy by Eric This essay provides an overview of linked open data (LOD) with a bent towards archivists. It enumerates a few advantages the archival community has when it comes to linked data, as well as some distinct disadvantages. It demonstrates one way to expose EAD as linked data through the use of XSLT &#8230; Continue reading Publishing LOD with a bent toward archivists
	Fun with Koha
These are brief notes about my recent experiences with Koha. Introduction As you may or may not know, Koha is a grand daddy of library-related open source software, and it is an integrated library system to boot. Such are no small accomplishments. For reasons I will not elaborate upon, I&#8217;ve been playing with Koha for &#8230; Continue reading Fun with Koha
	Fun with ElasticSearch and MARC
For a good time I have started to investigate how to index MARC data using ElasticSearch. This posting outlines some of my initial investigations and hacks. ElasticSearch seems to be an increasingly popular indexer. Getting it up an running on my Linux host was&#8230; trivial. It comes withe a full-fledged Perl interface. Nice! Since ElasticSearch &#8230; Continue reading Fun with ElasticSearch and MARC
	LiAM source code: Perl poetry
#!/usr/bin/perl # Liam Guidebook Source Code; Perl poetry, sort of # Eric Lease Morgan &#60;emorgan@nd.edu&#62; # February 16, 2014 # done exit; #!/usr/bin/perl # marc2rdf.pl &#8211; make MARC records accessible via linked data # Eric Lease Morgan &#60;eric_morgan@infomotions.com&#62; # December 5, 2013 &#8211; first cut; # configure use constant ROOT =&#62; &#8216;/disk01/www/html/main/sandbox/liam&#8217;; use constant MARC &#8230; Continue reading LiAM source code: Perl poetry
	LiAM SPARQL Endpoint
I have implemented a brain-dead and half-baked SPARQL endpoint to a subset of LiAM linked data, but there is the disclaimer. Errors will probably happen because of SPARQL syntax errors. Your milage will vary. Here are a few sample queries: Find all triples with RDF Schema labels &#8211; PREFIX rdf:&#60;http://www.w3.org/2000/01/rdf-schema#&#62; SELECT * WHERE { ?s &#8230; Continue reading LiAM SPARQL Endpoint
	EAD2RDF
I have played with an XSL stylesheet called EAD2RDF with good success. Archivists use EAD as their &#8220;MARC&#8221; records. EAD has its strengths and weakness, just like any metadata standard, but EAD is a flavor of XML. As such it lends itself to XSLT processing. EAD2RDF is a stylesheet written by Pete Johnston. After running &#8230; Continue reading EAD2RDF
	OAI2LOD Server
At first glance, a software package called OAI2LOD Server seems to work pretty well, and on a temporary basis, I have made one of my OAI repositories available as Linked Data &#8212; http://infomotions.com:2020/ OAI2LOD Server is a software package, written by Bernhard Haslhofer in 2008. Building, configuring, and running the server was all but painless. &#8230; Continue reading OAI2LOD Server
	TriLUG, open source software, and satisfaction
This is posting about TriLUG, open source software, and satisfaction for doing a job well-done. A long time ago, in a galaxy far far away, I lived in Raleigh (North Carolina), and a fledgling community was growing called the Triangle Linux User&#8217;s Group (TriLUG). I participated in a few of their meetings. While I was &#8230; Continue reading TriLUG, open source software, and satisfaction
	Use & understand: A DPLA beta-sprint proposal
This essay describes, illustrates, and demonstrates how the Digital Public Library of America (DPLA) can build on the good work of others who support the creation and maintenance of collections and provide value-added services against texts &#8212; a concept we call &#8220;use &#38; understand&#8221;. This document is available in a three of formats: 1) HTML &#8230; Continue reading Use &#38; understand: A DPLA beta-sprint proposal
	Raising awareness of open access publications
I was asked the other day about ways to make people aware of open access journal publications, and this posting echoes much of my response. Thanks again for taking the time this morning to discuss some of the ways open-access journals are using social media and other technology to distribute content and engage readers. I &#8230; Continue reading Raising awareness of open access publications
	Poor man’s restoration
This posting describes a poor man&#8217;s restoration process. Yesterday, I spent about an hour and a half writing down a work/professional to-do list intended to span the next few months. I prioritized things, elaborated on things, and felt I like had the good beginnings of an implementable plan. I put the fruits of my labors &#8230; Continue reading Poor man&#8217;s restoration
	My DPLA Beta-Sprint Proposal: The movie
Please see my updated and more complete Digital Public Library of America Beta-Sprint Proposal. The following posting is/was a precursor. The organizers of the Digital Public Library of America asked the Beta-Sprint Proposers to create a video outlining the progress of their work. Below is the script of my video as well as the video &#8230; Continue reading My DPLA Beta-Sprint Proposal: The movie
	DPLA Beta Sprint Submission
I decided to give it a whirl and particpate in the DPLA Beta Sprint, and below is my submission: DPLA Beta Sprint Submission My DPLA Beta Sprint submission will describe and demonstrate how the digitized versions of library collections can be made more useful through the application of text mining and various other digital humanities &#8230; Continue reading DPLA Beta Sprint Submission
	Next-generation library catalogs, or ‘Are we there yet?’
Next-generation library catalogs are really indexes, not catalogs, and increasingly the popular name for such things is &#8220;discovery system&#8221;. Examples include VuFind, Primo combined with Primo Central, Blacklight, Summon, and to a lesser extent Koha, Evergreen, OLE, and XC. While this may be a well-accepted summary of the situation, I really do not think it &#8230; Continue reading Next-generation library catalogs, or &#8216;Are we there yet?&#8217;
	Fun with RSS and the RSS aggregator called Planet
This posting outlines how I refined a number of my RSS feeds and then aggregated them into a coherent whole using Planet. Many different RSS feeds I have, more or less, been creating RSS (Real Simple Syndication) feeds since 2002. My first foray was not really with RSS but rather with RDF. At that time &#8230; Continue reading Fun with RSS and the RSS aggregator called Planet
	Book reviews for Web app development
This is a set of tiny book reviews covering the topic of Web app development for the iPhone, iPad, and iPod Touch. Unless you&#8217;ve been living under a rock for the past three or four years, then you know the increasing popularity of personal mobile computing devices. This has manifested itself through &#8220;smart phones&#8221; like &#8230; Continue reading Book reviews for Web app development
	Alex Lite (version 2.0)
This posting describes Alex Lite (version 2.0) &#8212; a freely available, standards-compliant distribution of electronic texts and ebooks. Alex Lite in a browser Alex Lite on a mobile A few years ago I created the first version of Alex Lite. Its primary purpose was to: 1) explore and demonstrate how to transform a particular flavor &#8230; Continue reading Alex Lite (version 2.0)
	Where in the world is the mail going?
For a good time, I geo-located the subscribers from a number of mailing lists, and then plotted them on a Google map. In other words, I asked the question, &#8220;Where in the world is the mail going?&#8221; The answer was sort of surprising. I moderate/manage three library-specific mailing lists: Usability4Lib, Code4Lib, and NGC4Lib. This means &#8230; Continue reading Where in the world is the mail going?
	Constant chatter at Code4Lib
As illustrated by the chart, it seems as if the chatter was constant during the most recent Code4Lib conference. For a good time and in the vein of text mining, I made an effort to collect as many tweets with the hash tag #c4l11 as well as the backchannel log files. (&#8220;Thanks, lbjay!&#8221;). I then &#8230; Continue reading Constant chatter at Code4Lib
	How “great” are the Great Books?
In this posting I present two quantitative methods for denoting the &#8220;greatness&#8221; of a text. Through this analysis I learned that Aristotle wrote the greatest book. Shakespeare wrote seven of the top ten books when it comes to love. And Aristophanes&#8217;s Peace is the most significant when it comes to war. Once calculated, this description &#8230; Continue reading How &#8220;great&#8221; are the Great Books?
	Code4Lib Conference, 2011
This posting documents my experience at the 2011 Code4Lib Conference, February 8-10 in Bloomington (Indiana). In a sentence, the Conference was well-organized, well-attended, and demonstrated the over-all health and vitality of this loosely structured community. At the same time I think the format of the Conference will need to evolve if it expects to significantly &#8230; Continue reading Code4Lib Conference, 2011
	Foray’s into parts-of-speech
This posting is the first of my text mining essays focusing on parts-of-speech. Based on the most rudimentary investigations, outlined below, it seems as if there is not much utility in the classification and description of texts in terms of their percentage use of parts-of-speech. Background For the past year or so I have spent &#8230; Continue reading Foray&#8217;s into parts-of-speech
	Visualizing co-occurrences with Protovis
This posting describes how I am beginning to visualize co-occurrences with a Javascript library called Protovis. Alternatively, I an trying to answer the question, &#8220;What did Henry David Thoreau say in the same breath when he used the word &#8216;walden&#8217;?&#8221; &#8220;In the same breath&#8221; Network diagrams are great ways to illustrate relationships. In such diagrams &#8230; Continue reading Visualizing co-occurrences with Protovis
	MIT’s SIMILE timeline widget
For a good time, I took a stab at learning how to implement a MIT SIMILE timeline widget. This posting describes what I learned. Background The MIT SIMILE Widgets are a set of cool Javascript tools. There are tools for implementing &#8220;exhibits&#8221;, time plots, &#8220;cover flow&#8221; displays a la iTunes, a couple of other things, &#8230; Continue reading MIT&#8217;s SIMILE timeline widget
	Illustrating IDCC 2010
This posting illustrates the &#8220;tweets&#8221; assigned to the hash tag #idcc10. I more or less just got back from the 6th International Data Curation Conference that took place in Chicago (Illinois). Somewhere along the line I got the idea of applying digital humanities computing techniques against the conference&#8217;s Twitter feed &#8212; hash tag #idcc10. After &#8230; Continue reading Illustrating IDCC 2010
	Ruler & Compass by Andrew Sutton
I most thoroughly enjoyed reading and recently learning from a book called Ruler &#38; Compass by Andrew Sutton. The other day, while perusing the bookstore for a basic statistics book, I came across Ruler &#38; Compass by Andrew Sutton. Having always been intrigued by geometry and the use of only a straight edge and compass &#8230; Continue reading Ruler &#038; Compass by Andrew Sutton
	Text mining Charles Dickens
This posting outlines how a person can do a bit of text mining against three works by Charles Dickens using a set of two Perl modules &#8212; Lingua::EN::Ngram and Lingua::Concordance. Lingua::EN::Ngram I recently wrote a Perl module called Lingua::EN::Ngram. Its primary purpose is to count all the ngrams (two-word phrases, three-word phrases, n-word phrases, etc.) &#8230; Continue reading Text mining Charles Dickens
	AngelFund4Code4Lib
The second annual AngelFund4Code4Lib &#8212; a $1,500 stipend to attend Code4Lib 2011 &#8212; is now accepting applications. These are difficult financial times, but we don&#8217;t want this to dissuade people from attending Code4Lib. [1] Consequently a few of us have gotten together, pooled our resources, and made AngelFund4Code4Lib available. Applying for the stipend is easy. &#8230; Continue reading AngelFund4Code4Lib
	Crowd sourcing the Great Books
This posting describes how crowd sourcing techniques are being used to determine the &#8220;greatness&#8221; of the Great Books. The Great Books of the Western World is a set of books authored by &#8220;dead white men&#8221; &#8212; Homer to Dostoevsky, Plato to Hegel, and Ptolemy to Darwin. [1] In 1952 each item in the set was &#8230; Continue reading Crowd sourcing the Great Books
	Great Books data set
This posting makes the Great Books data set freely available. As described previously, I want to answer the question, &#8220;How &#8216;great&#8217; are the Great Books?&#8221; In this case I am essentially equating &#8220;greatness&#8221; with statistical relevance. Specifically, I am using the Great Books of the Western World&#8217;s list of &#8220;great ideas&#8221; as search terms and &#8230; Continue reading Great Books data set
	ECDL 2010: A Travelogue
This posting outlines my experiences at the European Conference on Digital Libraries (ECDL), September 7-9, 2010 in Glasgow (Scotland). From my perspective, many of the presentations were about information retrieval and metadata, and the advances in these fields felt incremental at best. This does not mean I did not learn anything, but it does re-enforce &#8230; Continue reading ECDL 2010: A Travelogue
	Dan Marmion
Dan Marmion recruited and hired me to work at the University of Notre Dame during the Summer of 2001. The immediate goal was to implement a &#8220;database-driven website&#8221;, which I did with the help of the Digital Access and Information Architecture Department staff and MyLibrary. About eighteen months after I started working at the University &#8230; Continue reading Dan Marmion
	Great Books data dictionary
This is a sort of Great Books data dictionary in that it describes the structure and content of two data files containing information about the Great Books of the Western World. The data set is manifested in two files. The canonical file is great-books.xml. This XML file consists of a root element (great-books) and many &#8230; Continue reading Great Books data dictionary
	Twitter, Facebook, Delicious, and Alex
I spent time last evening and this afternoon integrating Twitter, Facebook, and Delicious into the my Alex Catalogue. The process was (almost) trivial: create Twitter, Facebook, and Delicious accounts select and configure the Twitter button I desired to use acquire the Delicious javascript for bookmarking place the results of Steps #1 and #2 into my &#8230; Continue reading Twitter, Facebook, Delicious, and Alex
	Where in the world are windmills, my man Friday, and love?
This posting describes how a Perl module named Lingua::Concordance allows the developer to illustrate where in the continum of a text words or phrases appear and how often. Windmills, my man Friday, and love When it comes to Western literature and windmills, we often think of Don Quiote. When it comes to &#8220;my man Friday&#8221; &#8230; Continue reading Where in the world are windmills, my man Friday, and love?
	Ngrams, concordances, and librarianship
This posting describes how the extraction of ngrams and the implementation of concordances are integrated into the Alex Catalogue of Electronic Texts. Given the increasing availability of full-text content in libraries, the techniques described here could easily be incorporated into traditional library &#8220;discovery systems&#8221; and/or catalogs, if and only if the library profession were to &#8230; Continue reading Ngrams, concordances, and librarianship
	Lingua::EN::Bigram (version 0.03)
I uploaded version 0.03 of Lingua::EN::Bigram to CPAN today, and it now supports not just bigrams, trigrams, quadgrams, but ngrams &#8212; an arbitrary phrase length. In order to test it out, I quickly gathered together some of my more recent essays, concatonated them together, and applied Lingua::EN::Bigram against the result. Below is a list of &#8230; Continue reading Lingua::EN::Bigram (version 0.03)
	Lingua::EN::Bigram (version 0.02)
I have written and uploaded to CPAN version 0.02 of my Perl module Lingua::EN::Bigram. From the README file: This module is designed to: 1) pull out all of the two-, three-, and four-word phrases in a given text, and 2) list these phrases according to their frequency. Using this module is it possible to create &#8230; Continue reading Lingua::EN::Bigram (version 0.02)
	Cool URIs
I have started implementing &#8220;cool&#8221; URIs against the Alex Catalogue of Electronic Texts. As outlined in Cool URIs for the Semantic Web, &#8220;The best resource identifiers&#8230; are designed with simplicity, stability and manageability in mind&#8230;&#8221; To that end I have taken to creating generic URIs redirecting user-agents to URLs based on content negotiation &#8212; 303 &#8230; Continue reading Cool URIs
	rsync, a really cool utility
Without direct physical access to my co-located host, backing up and preserving the Infomotions&#8217; 150 GB of website is challenging, but through the use of rsync things are a whole lot easier. rsync is a really cool utility, and thanks go to Francis Kayiwa who recommended it to me in the first place. &#8220;Thank you!&#8221; &#8230; Continue reading rsync, a really cool utility
	WiLSWorld, 2010
I had the recent honor, privilege, and pleasure of attending WiLSWorld (July 21-22, 2010 in Madison, Wisconsin), and this posting outlines my experiences there. In a sentence, I was pleased so see the increasing understanding of &#8220;discovery&#8221; interfaces defined as indexes as opposed to databases, and it is now my hope we &#8212; as a &#8230; Continue reading WiLSWorld, 2010
	Digital Humanities 2010: A Travelogue
I was fortunate enough to be able to attend a conference called Digital Humanities 2010 (London, England) between July 4th and 10th. This posting documents my experiences and take-aways. In a sentence, the conference provided a set of much needed intellectual stimulation and challenges as well as validated the soundness of my current research surrounding &#8230; Continue reading Digital Humanities 2010: A Travelogue
	How “great” is this article?
During Digital Humanities 2010 I participated in the THATCamp London Developers&#8217; Challenge and tried to answer the question, &#8220;How &#8216;great&#8217; is this article?&#8221; This posting outlines the functionality of my submission, links to a screen capture demonstrating it, and provides access to the source code. Given any text file &#8212; say an article from the &#8230; Continue reading How &#8220;great&#8221; is this article?
	ALA 2010
This is the briefest of travelogues describing my experience at the 2010 ALA Annual Meeting in Washington (DC). Pat Lawton and I gave a presentation at the White House Four Points Hotel on the &#8220;Catholic Portal&#8220;. Essentially it was a status report. We shared the podium with Jon Miller (University of Southern California) who described &#8230; Continue reading ALA 2010
	Text mining against NGC4Lib
I &#8220;own&#8221; a mailing list called NCG4Lib. It&#8217;s purpose is to provide a forum for the discussion of all things &#8220;next generation library catalog&#8221;. As of this writing, there are about 2,000 subscribers. Lately I have been asking myself, &#8220;What sorts of things get discussed on the list and who participates in the discussion?&#8221; I &#8230; Continue reading Text mining against NGC4Lib
	The Next Next-Generation Library Catalog
With the advent of the Internet and wide-scale availability of full-text content, people are overwhelmed with the amount of accessible data and information. Library catalogs can only go so far when it comes to delimiting what is relevant and what is not. Even when the most exact searches return 100&#8217;s of hits what is a &#8230; Continue reading The Next Next-Generation Library Catalog
	Measuring the Great Books
This posting describes how I am assigning quantitative characteristics to texts in an effort to answer the question, &#8220;How &#8216;great&#8217; are the Great Books?&#8221; In the end I make a plea for library science. Background With the advent of copious amounts of freely available plain text on the &#8216;Net comes the ability of &#8220;read&#8221; entire &#8230; Continue reading Measuring the Great Books
	Collecting the Great Books
In an effort to answer the question, &#8220;How &#8216;great&#8217; are the Great Books?&#8220;, I need to mirror the full texts of the Great Books. This posting describes the initial process I am using to do such a thing, but the imporant thing to note is that this process is more about librarianship than it is &#8230; Continue reading Collecting the Great Books
	Inaugural Code4Lib “Midwest” Regional Meeting
I believe the Inaugural Code4Lib &#8220;Midwest&#8221; Regional Meeting (June 11 &#038; 12, 2010 at the University of Notre Dame) was a qualified success. About twenty-six people attended. (At least that was the number of people who went to lunch.) They came from Michigan, Ohio, Iowa, Indiana, and Illinois. Julia Bauder won the prize for coming &#8230; Continue reading Inaugural Code4Lib &#8220;Midwest&#8221; Regional Meeting
	How “great” are the Great Books?
In the 1952 a set of books called the Great Books of the Western World was published. It was supposed to represent the best of Western literature and enable the reader to further their liberal arts education. Sixty volumes in all, it included works by Plato, Aristotle, Shakespeare, Milton, Galileo, Kepler, Melville, Darwin, etc. (See &#8230; Continue reading How &#8220;great&#8221; are the Great Books?
	Not really reading
Using a number of rudimentary digital humanities computing techniques, I tried to practice what I preach and extract the essence from a set of journal articles. I feel like the process met with some success, but I was not really reading. The problem A set of twenty-one (21) essays on the future of academic librarianship &#8230; Continue reading Not really reading
	Cyberinfrastructure Days at the University of Notre Dame
On Thursday and Friday, April 29 and 30, 2010 I attended a Cyberinfrastructure Days event at the University of Notre Dame. Through this process my personal definition of &#8220;cyberinfrastructure&#8221; was updated, and my basic understanding of &#8220;digital humanities computing&#8221; was confirmed. This posting documents the experience. Day #1 &#8211; Thursday, April 29 The first day &#8230; Continue reading Cyberinfrastructure Days at the University of Notre Dame
	About Infomotions Image Gallery: Flickr as cloud computing
This posting describes the whys and wherefores behind the Infomotions Image Gallery. Photography I was introduced to photography during library school, specifically, when I took a multi-media class. We were given film and movie cameras, told to use the equipment, and through the process learn about the medium. I took many pictures of very tall &#8230; Continue reading About Infomotions Image Gallery: Flickr as cloud computing
	Shiny new website
Infomotions has a shiny new website, and the process to create it was not too difficult. The problem A relatively long time ago (in a galaxy far far away), I implemented an Infomotions website look &#038; feel. Tabbed interface across the top. Local navigation down the left-hand side. Content in the middle. Footer along the &#8230; Continue reading Shiny new website
	Counting words
When I talk about &#8220;services against text&#8221; I usually get blank stares from people. When I think about it more, many of the services I enumerate are based on the counting of words. Consequently, I spent some time doing just that &#8212; counting words. I wanted to analyze the content of a couple of the &#8230; Continue reading Counting words
	Great Ideas Coefficient
This posting outlines a concept I call the Great Ideas Coefficient &#8212; an additional type of metadata used to denote the qualities of a text. Great Ideas Coefficient In the 1950s a man named Mortimer Adler and colleagues brought together what they thought were the most significant written works of Western civilization. They called this &#8230; Continue reading Great Ideas Coefficient
	My first ePub file
I made available my first ePub file today. Screen shot EPub is the current de facto standard file format for ebook readers. After a bit of reading, the format is not too difficult since all the files are plain-text XML files or images. The various metadata files are ePub-specific XML. The content is XHTML. The &#8230; Continue reading My first ePub file
	Alex Catalogue Widget
I created my first Apple Macintosh Widget today &#8212; Alex Catalogue Widget. The tool is pretty rudimentary. Enter something into the field. Press return or click the Search button. See search results against the Alex Catalogue of Electronic Texts displayed in your browser. The development process reminded me of hacking in HyperCard. Draw things on &#8230; Continue reading Alex Catalogue Widget
	Michael Hart in Roanoke (Indiana)
On Saturday, February 27, Paul Turner and I made our way to Roanoke (Indiana) to listen to Michael Hart tell stories about electronic texts and Project Gutenberg. This posting describes our experience. Roanoke and the library To celebrate its 100th birthday, the Roanoke Public Library invited Michael Hart of Project Gutenberg fame to share his &#8230; Continue reading Michael Hart in Roanoke (Indiana)
	Preservationists have the most challenging job
In the field of librarianship, I think the preservationists have the most challenging job because it is fraught with the greatest number of unknowns. Twenty-eight (28) CDs mangled book As I am writing this posting, I am in the middle of an annual processes &#8212; archiving the data I created from the previous year. This &#8230; Continue reading Preservationists have the most challenging job
	How to make a book (#2 of 3)
This is the second of a three-part series on how to make a book. The first posting described and illustrated how to use a thermo-binding machine to make a book. This posting describes and illustrates how to &#8220;weave&#8221; a book together &#8212; folding and cutting (or tearing). The process requires no tools. No glue. No &#8230; Continue reading How to make a book (#2 of 3)
	Good and best open source software
What qualities and characteristics make for a &#8220;good&#8221; piece of open source software? And once that question is answered, then what pieces of library-related open source software can be considered &#8220;best&#8221;? I do not believe there is any single, most important characteristic of open source software that qualifies it to be denoted as &#8220;best&#8221;. Instead, &#8230; Continue reading Good and best open source software
	Valencia and Madrid: A Travelogue
I recently had the opportunity to visit Valencia and Madrid (Spain) to share some of my ideas about librarianship. This posting describes some of things I saw and learned along the way. La Capilla de San Francisco de Borja Capilla del Santo Cáliz LIS-EPI Meeting In Valencia I was honored to give the opening remarks &#8230; Continue reading Valencia and Madrid: A Travelogue
	Colloquium on Digital Humanities and Computer Science: A Travelogue
On November 14-16, 2009 I attended the 4th Annual Chicago Colloquium on Digital Humanities and Computer Science at the Illinois Institute of Technology in Chicago. This posting outlines my experiences there, but in a phrase, I found the event to be very stimulating. In my opinion, libraries ought to be embracing the techniques described here &#8230; Continue reading Colloquium on Digital Humanities and Computer Science: A Travelogue
	Alex Catalogue collection policy
This page lists the guidelines for including texts in the Alex Catalogue of Electronic Texts. Originally written in 1994, much of it is still valid today. Purpose The primary purpose of the Catalogue is to provide me with the means for demonstrating a concept I call arscience through American and English literature as well as &#8230; Continue reading Alex Catalogue collection policy
	Alex, the movie!
Created circa 1998, this movie describes the purpose and scope of the Alex Catalogue of Electronic Texts. While coming off rather pompous, the gist of what gets said is still valid and correct. Heck, the links even work. &#8220;Thanks Berkeley!&#8221;
	Collecting water and putting it on the Web (Part III of III)
This is Part III of an essay about my water collection, specifically a summary, opportunities for future study, and links to the source code. Part I described the collection&#8217;s whys and hows. Part II described the process of putting it on the Web. Summary, future possibilities, and source code There is no doubt about it. &#8230; Continue reading Collecting water and putting it on the Web (Part III of III)
	Collecting water and putting it on the Web (Part II of III)
This is Part II of an essay about my water collection, specifically the process of putting it on the Web. Part I describes the whys and hows of the collection. Part III is a summary, provides opportunities for future study, and links to the source code. Making the water available on the Web As a &#8230; Continue reading Collecting water and putting it on the Web (Part II of III)
	Collecting water and putting it on the Web (Part I of III)
This is Part I of an essay about my water collection, specifically the whys and hows of it. Part II describes the process of putting the collection on the Web. Part III is a summary, provides opportunities for future study, and links to the source code. I collect water It may sound strange, but I &#8230; Continue reading Collecting water and putting it on the Web (Part I of III)
	Web-scale discovery services
Last week (Tuesday, August 18) Marshall Breeding and I participated in a webcast sponsored by Serials Solutions and Library Journal on the topic of &#8220;&#8216;Web-scale&#8217; discovery services&#8221;. Our presentations complimented one another in that we both described the current library technology environment and described how the creation of amalgamated indexes of book and journal article &#8230; Continue reading Web-scale discovery services
	How to make a book (#1 of 3)
This is a series of posts where I will describe and illustrate how to make books. In this first post I will show you how to make a book with a thermo-binding machine. In the second post I will demonstrate how to make a book by simply tearing and folding paper. In the third installment, &#8230; Continue reading How to make a book (#1 of 3)
	Book review of Larry McMurtry’s Books
I read with interest Larry McMurtry&#8217;s Books: A Memoir (Simon &#038; Schuster, 2008), but from my point of view, I would be lying if I said I thought the book had very much to offer. The book&#8217;s 259 pages are divided into 109 chapters. I was able to read the whole thing in six or &#8230; Continue reading Book review of Larry McMurtry&#8217;s Books
	Browsing the Alex Catalogue
The Alex Catalogue is browsable by author names, subject tags, and titles. Just select a browsable list, then a letter, and finally an item. Browsability is an important feature of any library catalog. It gives you an opportunity to see what the collection contains without entering a query. It is also possible to use browsability &#8230; Continue reading Browsing the Alex Catalogue
	Indexing and searching the Alex Catalogue
The Alex Catalogue of Electronic Texts uses state-of-the-art software to index both the metadata and full text of its content. While the interface accepts complex Boolean queries, it is easier to enter a single word, a number of words, or a phrase. The underlying software will interpret what you enter and do much of hard &#8230; Continue reading Indexing and searching the Alex Catalogue
	Microsoft Surface at Ball State
Me and a number of colleagues from the University of Notre Dame visited folks from Ball State University and Ohio State University to see, touch, and discuss all things Microsoft Surface. There were plenty of demonstrations surrounding music, photos, and page turners. The folks of Ball State were finishing up applications for the dedication of &#8230; Continue reading Microsoft Surface at Ball State
	Automatic metadata generation
I have been having a great deal of success extracting keywords and two-word phrases from documents and assigning them as &#8220;subject headings&#8221; to electronic texts &#8212; automatic metadata generation. In many cases but not all, the set of assigned keywords I&#8217;ve created are just as good if not better as the controlled vocabulary terms assigned &#8230; Continue reading Automatic metadata generation
	Alex on Google
I don&#8217;t exactly know how or why Google sometimes creates nice little screen shots of Web home pages, but it created one for my Alex Catalogue of Electronic Texts. I&#8217;ve seen them for other sites on the Web, and some of them even contain search boxes. I wish I could get Google to make one &#8230; Continue reading Alex on Google
	Top Tech Trends for ALA Annual, Summer 2009
This is a list of Top Tech Trends for the ALA Annual Meeting, Summer 2009.* Green computing The amount of computing that gets done on our planet has a measurable carbon footprint, and many of us, myself included, do not know exactly how much heat our computers put off and how much energy they consume. &#8230; Continue reading Top Tech Trends for ALA Annual, Summer 2009
	Mass Digitization Mini-Symposium: A Reverse Travelogue
The Professional Development Committee of the Hesburgh Libraries at the University of Notre Dame a &#8220;mini-symposium&#8221; on the topic of mass digitization on Thursday, May 21, 2009. This text documents some of what the speakers had to say. Given the increasingly wide availability of free full text information provided through mass digitization, the forum offered &#8230; Continue reading Mass Digitization Mini-Symposium: A Reverse Travelogue
	Lingua::EN::Bigram (version 0.01)
Below is the POD (Plain O&#8217; Documentation) file describing a Perl module I wrote called Lingua::EN::Bigram. The purpose of the module is to: 1) extract all of the two-word phrases from a given text, and 2) rank each phrase according to its probability of occurance. Very nice for doing textual analysis. For example, by applying &#8230; Continue reading Lingua::EN::Bigram (version 0.01)
	Lingua::Concordance (version 0.01)
Below is a man page describing a Perl I module I recently wrote called Lingua::Concordance (version 0.01). Given the increasing availability of full text books and journals, I think it behooves the library profession to aggressively explore the possibilities of providing services against text as a means of making the proverbial fire hose of information &#8230; Continue reading Lingua::Concordance (version 0.01)
	EAD2MARC
This posting simply shares three hacks I&#8217;ve written to enable me to convert EAD files to MARC records, and ultimately add them to my &#8220;discovery&#8221; layer &#8212; VUFind &#8212; for the Catholic Portal: ead2marcxml.sh &#8211; Using xsltproc and a modified version of Terry Reese&#8217;s XSL stylesheet, converts all the EAD/.xml files in the current directory &#8230; Continue reading EAD2MARC
	Text mining: Books and Perl modules
This posting simply lists some of the books I&#8217;ve read and Perl modules I&#8217;ve explored in regards to the field of text mining. Through my explorations of term frequency/inverse document frequency (TFIDF) I became aware of a relatively new field of study called text mining. In many ways, text mining is similar to data mining &#8230; Continue reading Text mining: Books and Perl modules
	Interent Archive content in “discovery” systems
This quick posting describes how Internet Archive content, specifically, content from the Open Content Alliance can be quickly and easily incorporated into local library &#8220;discovery&#8221; systems. VuFind is used here as the particular example: Get keys &#8211; The first step is to get a set of keys describing the content you desire. This can be &#8230; Continue reading Interent Archive content in &#8220;discovery&#8221; systems
	TFIDF In Libraries: Part III of III (For thinkers)
This is the third of the three-part series on the topic of TFIDF in libraries. In Part I the why&#8217;s and wherefore&#8217;s of TFIDF were outlined. In Part II TFIDF subroutines and programs written in Perl were used to demonstrate how search results can be sorted by relevance and automatic classification can be done. In &#8230; Continue reading TFIDF In Libraries: Part III of III (For thinkers)
	The decline of books
[This posting is in response to a tiny thread on the NGC4Lib mailing list about the decline of books. &#8211;ELM] Yes, books are on the decline, but in order to keep this trend in perspective it is important to not confuse the medium with the message. The issue is not necessarily about books as much &#8230; Continue reading The decline of books
	Code4Lib Software Award: Loose ends
Loose ends make me feel uncomfortable, and one of the loose ends in my professional life is the Code4Lib Software Award. Code4Lib began as a mailing list in 2003 and has grown to about 1,200 subscribers from all over the world. New people subscribe to the list almost daily. Its Web presence started up in &#8230; Continue reading Code4Lib Software Award: Loose ends
	TFIDF In Libraries: Part II of III (For programmers)
This is the second of a three-part series called TFIDF In Libraries, where relevancy ranking techniques are explored through a set of simple Perl programs. In Part I relevancy ranking was introduced and explained. In Part III additional word/document weighting techiques will be explored to the end of filtering search results or addressing the perennial &#8230; Continue reading TFIDF In Libraries: Part II of III (For programmers)
	Ralph Waldo Emerson’s Essays
It was with great anticipation that I read Ralph Waldo Emerson&#8217;s Essays (both the First Series as well as the Second Series), but my expectations were not met. In a sentence I thought Emerson used too many words to say things that could have been expressed more succinctly. The Essays themselves are a set of &#8230; Continue reading Ralph Waldo Emerson&#8217;s Essays
	TFIDF In Libraries: Part I of III (For Librarians)
This is the first of a three-part series called TFIDF In Libraries, where &#8220;relevancy ranking&#8221; will be introduced. In this part, term frequency/inverse document frequency (TFIDF) &#8212; a common mathematical method of weighing texts for automatic classification and sorting search results &#8212; will be described. Part II will illustrate an automatic classification system and simple &#8230; Continue reading TFIDF In Libraries: Part I of III (For Librarians)
	A day at CIL 2009
This documents my day-long experiences at the Computers in Libraries annual conference, March 31, 2009. In a sentence, the meeting was well-attended and covered a wide range of technology issues. Washington Monument The day began with an interview-style keynote address featuring Paul Holdengraber (New York Public Library) interviewed by Erik Boekesteijn (Library Concept Center). As &#8230; Continue reading A day at CIL 2009
	Quick Trip to Purdue
Last Friday, March 27, I was invited by Michael Witt (Interdisciplinary Research Librarian) at Purdue University to give a presentation to the library faculty on the topic of &#8220;next generation&#8221; library catalogs. During the presentation I made an effort to have the participants ask and answer questions such as &#8220;What is the catalog?&#8221;, &#8220;What is &#8230; Continue reading Quick Trip to Purdue
	Library Technology Conference, 2009: A Travelogue
This posting documents my experiences at the Library Technology Conference at Macalester  College (St. Paul, Minnesota) on March 18-19, 2009. In a sentence, this well-organized regional conference provided professionals from near-by states an opportunity to listen, share, and discuss ideas concerning the use of computers in libraries. Wallace Library Dayton Center Day #1, Wednesday The &#8230; Continue reading Library Technology Conference, 2009: A Travelogue
	Code4Lib Open Source Software Award
As a community, let&#8217;s establish the Code4Lib Open Source Software Award. Lot&#8217;s of good work gets produced by the Code4Lib community, and I believe it is time to acknowledge these efforts in some tangible manner. Our profession is full of awards for leadership, particular aspects of librarianship, scholarship, etc. Why not an award for the &#8230; Continue reading Code4Lib Open Source Software Award
	Code4Lib Conference, Providence (Rhode Island) 2009
This posting documents my experience at the Code4Lib Conference in Providence, Rhode Island between February 23-26, 2009. To summarize my experiences, I went away with a better understanding of linked data, it is an honor to be a part of this growing and maturing community, and finally, this conference is yet another example of the &#8230; Continue reading Code4Lib Conference, Providence (Rhode Island) 2009
	Henry David Thoreau’s Walden
As I sit here beside my fire at the cabin, I reflect on the experiences documented by Henry David Thoreau in his book entitled Walden. Being human On one level, the book is about a man who goes off to live in a small cabin by a pond named Walden. It describes how be built &#8230; Continue reading Henry David Thoreau&#8217;s Walden
	Eric Lease Morgan’s Top Tech Trends for ALA Mid-Winter, 2009
This is a list of &#8220;top technology trends&#8221; written for ALA Mid-Winter, 2009. They are presented in no particular order. [This text was originally published on the LITA Blog, but it is duplicated here because &#8220;lot&#8217;s of copies keep stuff safe.&#8221; &#8211;ELM] Indexing with Solr/Lucene works well &#8211; Lucene seems to have become the gold &#8230; Continue reading Eric Lease Morgan&#8217;s Top Tech Trends for ALA Mid-Winter, 2009
	YAAC: Yet Another Alex Catalogue
I have implemented another version of my Alex Catalogue of Electronic Texts, more specifically, I have dropped the use of one indexer and replaced it with Solr/Lucene. See http://infomotions.com/alex/ This particular implementation does not have all the features of the previous one. No spell check. No thesaurus. No query suggestions. On the other hand, it &#8230; Continue reading YAAC: Yet Another Alex Catalogue
	ISBN numbers
I&#8217;m beginning to think about ISBN numbers and the Alex Catalogue of Electronic Texts. For example, I can add ISBN numbers to Alex, link them to my (fledgling) LibraryThing collection, and display lists of recently added items here: Interesting, but I think the list will change over time, as new things get added to my &#8230; Continue reading ISBN numbers
	Fun with WebService::Solr, Part III of III
This is the last of a three-part series providing an overview of a set of Perl modules called WebService::Solr. In Part I, WebService::Solr was introduced with two trivial scripts. Part II put forth two command line driven scripts to index and search content harvested via OAI. Part III illustrates how to implement an Search/Retrieve via &#8230; Continue reading Fun with WebService::Solr, Part III of III
	Fun with WebService::Solr, Part II of III
In this posting (Part II), I will demonstrate how to use WebService::Solr to create and search a more substantial index, specifically an index of metadata describing the content of the Directory of Open Access Journals. Part I of these series introduced Lucene, Solr, and WebService::Solr with two trivial examples. Part III will describe how to &#8230; Continue reading Fun with WebService::Solr, Part II of III
	Mr. Serials is dead. Long live Mr. Serials
This posting describes the current state of the Mr. Serials Process. Background Round about 1994 when I was employed by the North Carolina State University Libraries, Susan Nutter, the Director, asked me to participate in an ARL Collection Analysis Project (CAP). The goal of the Project was to articulate a mission/vision statement for the Libraries &#8230; Continue reading Mr. Serials is dead. Long live Mr. Serials
	Fun with WebService::Solr, Part I of III
This posting (Part I) is an introduction to a Perl module called WebService::Solr. In it you will learn a bit of what Solr is, how it interacts with Lucene (an indexer), and how to write two trivial Perl scripts: 1) an indexer, and 2) a search engine. Part II of this series will introduce less &#8230; Continue reading Fun with WebService::Solr, Part I of III
	Visit to Ball State University
I took time yesterday to visit a few colleagues at Ball State University. Ball State, the movie! Over the past few months the names of some fellow librarians at Ball State University repeatedly crossed my path. The first was Jonathan Brinley who is/was a co-editor on Code4Lib Journal. The second was Kelley McGrath who was &#8230; Continue reading Visit to Ball State University
	A Day with OLE
This posting documents my experience at Open Library Environment (OLE) project workshop that took place at the University of Chicago, December 11, 2008. In a sentence, the workshop provided an opportunity to describe and flowchart a number of back-end library processes in an effort to help design an integrated library system. What is OLE full-scale &#8230; Continue reading A Day with OLE
	ASIS&T Bulletin on open source software
The following is a verbatim duplication of an introduction I wrote for a special issue of the ASIS&#038;T Bulletin on open source software in libraries. I appreciate the opportunity to bring the issue together because I sincerely believe open source software provides a way for libraries to have more control over their computing environment. This &#8230; Continue reading ASIS&#038;T Bulletin on open source software
	Fun with the Internet Archive
I&#8217;ve been having some fun with Internet Archive content. The process More specifically, I have created a tiny system for copying scanned materials locally, enhancing it with a word cloud, indexing it, and providing access to whole thing. There is how it works: Identify materials of interest from the Archive and copy their URLs to &#8230; Continue reading Fun with the Internet Archive
	Snow blowing and librarianship
I don&#8217;t exactly know why, but I enjoy snow blowing. snow blower I think it began when I was college. My freshman year I stayed on during the January earning money from Building &#38; Grounds. For much of the time they simply said, &#8220;Go shovel some snow.&#8221; It was quiet, peaceful, and solitary. It was &#8230; Continue reading Snow blowing and librarianship
	Tarzan of the Apes
This is a simple word cloud of Edgar Rice Burroughs&#8217; Tarzan of the Apes: [openbook]978-1593082277[/openbook] tarzan &#160;little &#160;clayton &#160;great &#160;jungle &#160;before &#160;
	WorldCat Hackathon
I attended the first-ever WorldCat Hackathon on Friday and Saturday (November 7 &#38; 8), and us attendees explored ways to take advantage of various public application programmer interfaces (APIs) supported by OCLC. Web Services The WorldCat Hackathon was an opportunity for people to get together, learn about a number of OCLC-supported APIs, and take time &#8230; Continue reading WorldCat Hackathon
	VUFind at PALINET
I attended a VUFind meeting at PALINET in Philadelphia today, November 6, and this posting summarizes my experiences there. As you may or may not know, VUFind is a &#8220;discovery layer&#8221; intended to be applied against a traditional library catalog. Originally written by Andrew Nagy of Villanova University, it has been adopted by a handful &#8230; Continue reading VUFind at PALINET
	Dinner with Google
On Thursday, September 4 a person from Google named Jon Trowbridge gave a presentation at Notre Dame called &#8220;Making scientific datasets universally accessible and useful&#8221;. This posting reports on the presentation and dinner afterwards. The presentation Jon Trowbridge is a software engineer working for Google. He seems to be an open source software and an &#8230; Continue reading Dinner with Google
	MyLibrary: A Digital library framework & toolbox
I recently had published an article in Information Technology and Libraries (ITAL) entitled &#8220;MyLibrary: A Digital library framework &#38; toolkit&#8221; (volume 27, number 3, pages 12-24, September 2008). From the abstract: This article describes a digital library framework and toolkit called MyLibrary. At its heart, MyLibrary is designed to create relationships between information resources and &#8230; Continue reading MyLibrary: A Digital library framework &#38; toolbox
	MBooks, revisited
This posting makes available a stylesheet to render MARCXML from a collection of records called MBooks. In a previous post &#8212; get-mbooks.pl &#8212; I described how to use OAI-PMH to harvest MARC records from the MBooks project. The program works; it does what it is suppose to do. The MBooks collection is growing so I &#8230; Continue reading MBooks, revisited
	wordcloud.pl
Attached should be simple Perl script called wordcloud.pl. Initialize it with a hash of words and associated integers. Output rudimentary HTML in the form of a word cloud. This hack was used to create the word cloud in a posting called &#8220;Last of the Mohicans and services against texts&#8220;.
	Last of the Mohicans and services against texts
Here is a word cloud representing James Fenimore Cooper&#8217;s The Last of the Mohicans; A narrative of 1757. It is a trivial example of how libraries can provide services against documents, not just the documents themselves. scout &#160;heyward &#160;though &#160;duncan &#160;uncas &#160;little &#160;without &#160;own &#160;eyes &#160;before &#160;hawkeye &#160;indian &#160;young &#160;magua &#160;much &#160;place &#160;long &#160;time &#160;moment &#8230; Continue reading Last of the Mohicans and services against texts
	Crowd sourcing TEI files
How feasible and/or practical do you think &#8220;crowd sourcing&#8221; TEI files would be? I like writing in my books. In fact, I even have a particular system for doing it. Circled things are the subjects of sentences. Squared things are proper nouns. Underlined things connected to the circled and squared things are definitions. Moreover, my &#8230; Continue reading Crowd sourcing TEI files
	Metadata and data structures
It is important to understand the differences between metadata and data structures. This posting outlines some of the differences between the two. Introduction Every once in a while people ask me for advice that I am usually very happy to give because the answers usually involve succinctly articulating some of the things floating around in &#8230; Continue reading Metadata and data structures
	Origami is arscient, and so is librarianship
To do origami well a person needs to apply both artistic and scientific methods to the process. The same holds true for librarianship. Arscience Arscience is a word I have coined to denote the salient aspects of both art and science. It is a type of thinking &#8212; thinquing &#8212; that is both intuitive as &#8230; Continue reading Origami is arscient, and so is librarianship
	On the move with the Mobile Web
On The Move With The Mobile Web by Ellyssa Kroski provides a nice overview of mobile technology and what it presently means for libraries. What is in the Report In my most recent list of top technology trends I mentioned mobile devices. Because of this Kroski had a copy of the Library Technology Report she &#8230; Continue reading On the move with the Mobile Web
	TPM — technological protection measures
I learned a new acronym a few weeks ago &#8212; TPM &#8212; which stands for &#8220;technological protection measures&#8221;, and in the May 2008 issue of College &#38; Research Libraries Kristin R. Eschenfelder wrote an article called &#8220;Every library&#8217;s nightmare?&#8221; and enumerated various types of protection measures employed by publishers to impede the use of electronic &#8230; Continue reading TPM &#8212; technological protection measures
	Against The Grain is not
Against The Grain is not your typical library-related serial. Last year I had the opportunity to present at the 27th Annual Charleston Conference where I shared my ideas regarding the future of search and how some of those ideas can implemented in &#8220;next-generation&#8221; library catalogs. In appreciation of my efforts I was given a one-year &#8230; Continue reading Against The Grain is not
	E-journal archiving solutions
A JISC-funded report on e-journal archiving solutions is an interesting read, and it seems as if no particular solution is the hands-down &#8220;winner&#8221;. Terry Morrow, et al. recently wrote a report sponsored by JISC called &#8220;A Comparative study of e-journal archiving solutions&#8220;. Its goal was to compare &#038; contrast various technical solutions to archiving electronic &#8230; Continue reading E-journal archiving solutions
	Web 2.0 and “next-generation” library catalogs
A First Monday article systematically comparing &#038; contrasting Web 1.0 and Web 2.0 website technology recently caught my interest, and I think it points a way to making more informed decisions regarding &#8220;next-generation&#8221; library catalog interfaces and Internet-based library services in general. Web 1.0 versus Web 2.0 Graham Cormode and Balachander Krishnamurthy in &#8220;Key differences &#8230; Continue reading Web 2.0 and &#8220;next-generation&#8221; library catalogs
	Alex Lite: A Tiny, standards-compliant, and portable catalogue of electronic texts
One the beauties of XML its ability to be transformed into other plain text files, and that is what I have done with a simple software distribution called Alex Lite. My TEI publishing system(s) A number of years ago I created a Perl-based TEI publishing system called &#8220;My personal TEI publishing system&#8220;. Create a database &#8230; Continue reading Alex Lite: A Tiny, standards-compliant, and portable catalogue of electronic texts
	Indexing MARC records with MARC4J and Lucene
In anticipation of the eXtensible Catalog (XC) project, I wrote my first Java programs a few months ago to index MARC records, and you can download them from here. The first uses MARC4J and Lucene to parse and index MARC records. The second uses Lucene to search the index created from the first program. They &#8230; Continue reading Indexing MARC records with MARC4J and Lucene
	Encoded Archival Description (EAD) files everywhere
I&#8217;m beginning to see Encoded Archival Description (EAD) files everywhere, but maybe it is because I am involved with a project called the Catholic Research Resources Alliance (CRRA). As you may or may not know, EAD files are the &#8220;MODS files&#8221; of the archival community. These XML files provide the means to administratively describe archival &#8230; Continue reading Encoded Archival Description (EAD) files everywhere
	eXtensible Catalog (XC): A very transparent approach
An article by Jennifer Bowen entitled &#8220;Metadata to support next-generation library resource discovery: Lessons from the eXtensible Catalog, Phase 1&#8221; appeared recently in Information Technology &#38; Libraries, the June 2008 issue. [1] The article outlines next-steps for the XC Project and enumerates a number of goals for their &#8220;&#8216;next-generation&#8217; library catalog&#8221; application/system: provide access to &#8230; Continue reading eXtensible Catalog (XC): A very transparent approach
	Top Tech Trends for ALA (Summer ’08)
Here is a non-exhaustive list of Top Technology Trends for the American Library Association Annual Meeting (Summer, 2008). These Trends represent general directions regarding computing in libraries &#8212; short-term future directions where, from my perspective, things are or could be going. They are listed in no priority order. &#8220;Bling&#8221; in your website &#8211; I hate &#8230; Continue reading Top Tech Trends for ALA (Summer &#8217;08)
	Google Onebox module to search LDAP
This posting describes a Google Search Appliance Onebox module for searching an LDAP directory. At my work I help administrate a Google Search Appliance. It is used index the university&#8217;s website. The Appliance includes a functionality &#8212; called Onebox &#8212; allowing you to search multiple indexes and combining the results into a single Web page. &#8230; Continue reading Google Onebox module to search LDAP
	DLF ILS Discovery Internet Task Group Technical Recommendation
I read the great interest the DLF ILS Discovery Internet Task Group (ILS-DI) Technical Recommendation [1], and I definitely think it is a step in the right direction for making the content of library systems more accessible. In regards to the integrated systems of libraries, the primary purpose of the Recommendations is to: improve discovery &#8230; Continue reading DLF ILS Discovery Internet Task Group Technical Recommendation
	HyperNote Pro: a text annotating HyperCard stack
In 1992 I wrote a HyperCard stack called HyperNote Pro. HyperNote allowed you to annotate plain text files, and it really was a hypertext system. Import a plain text file. Click a word to see a note. Option-click a word to create a note. Shift-click a word to create an image note. Option-shift-click a word &#8230; Continue reading HyperNote Pro: a text annotating HyperCard stack
	Steve Cisler
This is a tribute to Steve Cisler, community builder and librarian. Late last week I learned from Paul Jones&#8217;s blog that Steve Cisler had died. He was a mentor to me, and I&#8217;d like to tell a few stories describing the ways he assisted me in my career. I met Steve in 1989 or so &#8230; Continue reading Steve Cisler
	Code4Lib Journal Perl module (version .003)
I hacked together a Code4Lib Journal Perl module providing read-only access to the Journal&#8217;s underlying WordPress (MySQL) database. You can download the distribution, and the following is from the distribution&#8217;s README file: This is the README file for a Perl module called C4LJ &#8212; Code4Lib Journal Code4Lib Journal is the refereed serial of the Code4Lib &#8230; Continue reading Code4Lib Journal Perl module (version .003)
	Open Library, the movie!
For a good time, I created a movie capturing some of the things I saw while attending the Open Library Developer&#8217;s Meeting a few months ago. Introducing, Open Library, the movie!
	get-mbooks.pl
I few months ago I wrote a program called get-mbooks.pl, and it is was used to harvest MARC data from the University of Michigan&#8217;s OAI repository of public domain Google Books. You can download the program here, and what follows is the distribution&#8217;s README file: This is the README file for script called get-mbooks.pl This &#8230; Continue reading get-mbooks.pl
	Hello, World!
Hello, World! It is nice to meet you.